How to Scrape DataDome Protected Websites (Bypass DataDome Techniques)

Are you looking for the best ways to bypass the Datadome anti-spam system that is stopping you from carrying out your automation tasks online? Come in now and discover the best method to get this done now.

How to Bypass DataDome

šŸ” Bypass Datadome Options [Overview]

1ļøāƒ£ Scrape Google Cache Version: You can scrape data from the Google Cache of a website if it's available and doesn't change frequently.

2ļøāƒ£ Scrape With Fortified Headless Browsers: Use fortified headless browsers like Puppeteer, Playwright, or Selenium with stealth plugins to minimize fingerprint leaks.

3ļøāƒ£ Anti-Bot Solvers: While there are no specific solvers for DataDome, tools like FlareSolverr designed for Cloudflare may work in some cases.

4ļøāƒ£ Residential Proxies With DataDome Bypass: Smart proxies, like those offered by Brightdata, Smartproxy, and Soax.io, have their own private bypasses that are harder for DataDome to patch.

5ļøāƒ£ Reverse Engineer DataDome's Protection: This complex option involves reverse engineering both the backend and client-side detection techniques of DataDome.


Bots are widely recognized as being used by fraudulent people for cyberattacks, despite the fact that they may also be used for good. As such, because bot attacks are becoming more sophisticated, many companies and individuals have good reasons to use Datadome to secure their websites.

However,Ā  there are times when using this WAF service can be challenging. For instance, it might be difficult to scrape data from a website that is Datadome-protected. Nevertheless, the good news is that Datadome can be bypassed.

It is definitely one of the trickiest and most challenging security measures to get through, yet with the right knowledge and techniques, it is still possible to slip through its defenses. In this article, we'll examine what Datadome is, how it works, and the strategies you'll need to use to get around it. With further ado, let's get started.


What is DataDome?

DataDome-Cloud-Cyber-Security-Expo-London-2023-Banner

Datadome is a web application firewall (WAF) solution designed for protection against bots and online fraud. With unprecedented accuracy and zero latency, it identifies and mitigates attacks. In other words, it shields websites, APIs, and mobile apps from online fraud and bot attacks. Additionally, it can defend against layer 7 DDoS, card and payment fraud, credential stuffing, and scraping.

More than 3 trillion data signals each day from 25 global sites are processed by DataDome's AI-powered bot detection engine. Hence, the biggest international e-commerce companies can be protected as a result in real-time. Along with its bot prevention solution, it also provides an integrated CAPTCHA that is safe, easy to use and complies with privacy laws. The Inc. 5000 ranked DataDome in 21st place in the category of cybersecurity in 2022.


How Does Datadome Detect Bots?

How Does Datadome Detect Bots

To detect bots, Datadome uses several server-side and client-side techniques to determine the trust score. It is based on this trust score that you are either blocked or allowed to browse the website. You do, however, have a reasonable possibility of bypassing Datadome's bot defense if we understand each stage of this procedure. Let's examine each stage in further detail.

Server-Side Techniques
  • TLS Fingerprinting

The most common server-side fingerprinting method used by web servers to identify a web client, including browsers, scripts, and CLI tools, is TLS fingerprinting. TLS is an evolution of SSL, the protocol in charge of managing secure connections between web clients and servers. TLS is actually the first phase of the HTTP connection.

To be more exact, it is a protocol that uses a number of cryptographic methods to encrypt web-based interactions between a client and a server. Before using TLS for communication, the client and server must first complete the TLS handshake.

As such, Datadome uses this TLS fingerprint to recognize and restrict bot activity. It can determine which web client is attempting to start a conversation before deciding whether to approve or deny the request.

  • HTTP Fingerprinting

HTTP fingerprinting comes next. In essence, HTTP fingerprinting entails examining how the HTTP protocol is implemented on a device. The complexity of the HTTP protocol makes it simpler for Datadome to distinguish between connections between real users and bots. Most websites use HTTP2 or HTTP3.

Although HTTP2 is supported by many contemporary libraries, including Python's httpx and cURL, it is not the default. The ability to identify bots via HTTP2 fingerprinting is thus possible. By enabling parallel requests and responses on the same TCP connection and implementing header field compression, its major objective is to enhance the performance of websites and web applications.

In essence, the most fundamental approach to HTTP fingerprinting entails sending an HTTP request to the user and inspecting the HTTP response header. As a result, the trust score of the connection can be accurately determined by Datadome's anti-bot protection.

  • TCP/IP Fingerprinting

TCP/IP fingerprinting is another method through which Datadome identifies bots. TCP/IP fingerprinting is basically a remote detection technique for a TCP/IP protocol stack implementation's features. So, as part of the TCP three-way handshake, a TCP SYN packet is used to start a TCP/IP connection. The operating system of the distant machine can then be deduced by Datadome using the combination of these parameters.

In other words, Datadome can use this method to ascertain whether the client is connecting through a Linux, Mac, or Windows workstation. This would help define the trust score of the connection and tell whether itā€™s a bot or an actual human being. Also, IP address analysis plays a great role.

Since Datadome has access to many different IP databases and can look up the connecting client's IP address, this method can be used to identify the client's ISP, location, and relevant other information. Datadome makes use of three different types of IP addresses. These include:

Residual IP addresses, which are home addresses assigned to average people by internet service providers, are essential because actual humans mostly use residential IP addresses, which are quite expensive to acquire. Hence, they provide a more positive trust score.

Datacenter IP addresses are assigned by different datacenters like CyrusOne, AWS, IBM Cloud, and Google Cloud, just to mention a few. Unlike residential IP addresses, datacenter IPs reflected a significant negative trust score. This means they are likely to be used by bots.

The mobile IP addresses come last. Mobile customers and phone towers decide who gets these. Mobile IP addresses offer a high level of trust because many of the users are real people, just like residence IP addresses do. Furthermore, the objective of mobile IPs is to minimize the effects of location changes as the mobile host moves around without having to modify the underlying TCP or IP in order to maintain the TCP connection between a mobile host and a static host.

These allow Datadome to make an informed guess as to whether the connecting client is a human or a robot.

Client-side Techniques
  • Browser Fingerprinting

Browser fingerprinting is quite a complex client-side technique used by Datadome to ascertain the trust score of a request. It is a technique for tracking and identifying users that a website protected by Datadome uses to link particular browsing sessions to a user.

In other words, it gathers enough data points to allow Datadome to calculate a trust score. Javascript fingerprinting of browsers is therefore a key method of gathering this information. Datadome fingerprints the client machine using the Javascript engine of the client to gather information on the hardware, operating system, Javascript runtime, and web browser capabilities.

However, you can run the following JavaScript in your developer console if you're interested in the data:

console.log("OS: " + navigator.platform);

console.log("Available RAM in GB: " + navigator.deviceMemory);

document.body.innerHTML += '<canvas id="glcanvas" width="0" height="0"></canvas>';

var canvas = document.getElementById("glcanvas");

var gl = canvas.getContext("experimental-webgl");

var dbgRender = gl.getExtension("WEBGL_debug_renderer_info");

console.log("GL renderer: " + gl.getParameter(gl.RENDERER));

console.log("GL vendor: " + gl.getParameter(gl.VENDOR));

console.log("Unmasked renderer: " + gl.getParameter(dbgRender.UNMASKED_RENDERER_WEBGL));

console.log("Unmasked vendor: " + gl.getParameter(dbgRender.UNMASKED_VENDOR_WEBGL));
  • OS Fingerprinting

Datadome generates some sort of protection using the user's operating system, much like browser fingerprinting. The Simple Network Management Protocol (SNMP), the GPU, and domain names can all be identified by OS fingerprinting, which provides information that malicious individuals can use to target specific devices.

Datadome is aware of all of these and employs them to address the weak points. However, if the OS fingerprinting information catches your interest, you can access it by running the following JavaScript in your developer console:

console.log("OS: " + navigator.platform);

console.log("Available RAM in GB: " + navigator.deviceMemory);




document.body.innerHTML += '<canvas id="glcanvas" width="0" height="0"></canvas>';

var canvas = document.getElementById("glcanvas");

var gl = canvas.getContext("experimental-webgl");




var dbgRender = gl.getExtension("WEBGL_debug_renderer_info");

console.log("GL renderer: " + gl.getParameter(gl.RENDERER));

console.log("GL vendor: " + gl.getParameter(gl.VENDOR));

console.log("Unmasked renderer: " + gl.getParameter(dbgRender.UNMASKED_RENDERER_WEBGL));

console.log("Unmasked vendor: " + gl.getParameter(dbgRender.UNMASKED_VENDOR_WEBGL));
  • Canvas Fingerprinting

Drawing text and pictures is made possible by the canvas API that is part of HTML5. To render website content, browsers can make use of these features. On various web browsers and devices, this canvas element is rendered differently. As a result, Datadome uses Canvas fingerprinting to take advantage of these variations and create special digital fingerprints for each user, thereby determining the trust score that indicates whether or not the user is a bot.

To render an image and generate a canvas fingerprint, Datadome leverages frameworks for canvas fingerprinting like WebGL. Thankfully, Datadome keeps a sizable dataset of reliable canvas fingerprints and user-agent combinations. As a result, an instant block occurs when a request comes from a user whose canvas fingerprint does not match. All of these activities remain in the background without ever affecting the user's experience.

  • Behavioural Data

With how complex and secure the above techniques are, Datadome still protects against malicious bots with a user's behavioral data. To track people, Datadome adds event listeners to websites.

As a result, Datadome makes use of AI to examine user connection patterns. Hence, a user's interactions with a website or app generate this kind of data. They can come from typing speed, mouse movements, touching the screen, or other gestures.

Hence, it is obvious to DataDome that a request is coming from an automated browser and not a real user if a bot interacts with a website without using mouse gestures or other human-like actions. In this scenario, the trust score calculated by Datadome won't be a fixed value but will instead be continuously adjusted based on the user's behaviour.


How Do These Tie into Datadomeā€™s Anti-bot Protection?

These fingerprinting methods play a significant role in Datadome's protection of a website, an API, or a mobile application. Datadome completes this without having an impact on your users. Additionally, DataDome's anti-bot protection system separates legitimate users from malicious bots using a variety of fingerprint signals.

As a result, up to one trillion pieces of data from server- and client-side signals are processed daily by this bot protection program. In order to keep ahead of the most recent bot trends, they also frequently update their fingerprint signals.


How to Bypass DataDome

From the session above, we can see the different complex processes Datadome uses to distinguish bots from actual human beings. Fortunately, to bypass Datadome, we would need the combined knowledge of these server-side and client-side fingerprinting techniques. The bypass measures we would be discussing in this session would essentially allow you to sneak under the radar of the Datadome bot detection and protection techniques, especially for those looking to accomplish tasks like web scraping.

  • Using Quality Proxies

Using Quality Proxies

To bypass Datadome's bot detection and protection, proxies are required. They aid in masking your IP address. It is important to use it, as it would offer your request a higher trust score, as Datadome can identify the IP addresses of website visitors. As a result, there are various proxy types, although residential and mobile proxies work best for avoiding Datadome.

These proxies are the most desired because the majority of their users are actual people, which makes them harder to identify. There are several proxy providers out there from which you can acquire these services in order to use these proxies. Bright Data and Smartproxy are two providers we do recommend. Both of them have the right amount of proxy pool that you need to stay undetected.

Regarding proxies, another consideration is whether the ones you select rotate or are static. Rotating proxies do not reveal your identity when you send too many requests, unlike static proxies. Rotating is thus the most effective technique to go around Datadome.

  • Using Scraping API

Using Scraping API

Specifically for online scraping tasks, using a scraping API can help you get around a block when attempting to scrape data from a Datadome-protected website. Although it is possible to avoid Datadome, keeping up with bypass strategies can be very time-consuming.

Thus, using API scraping effectively is important. It can avoid CAPTCHAs, use rotating proxies, and use headless browsers, among other methods, to get around DataDome. You can delegate all of the web scraping complexities and bypass logic to an API by using a web scraping API.

For instance, all you have to do to scrape websites protected by Datadome is turn on the anti-scraping protection bypass capabilities. The following is an example provided by a well-known scraping API provider, Scrapfly:

from scrapfly import ScrapflyClient, ScrapeConfig

scrapfly = ScrapflyClient(key="YOUR API KEY")

result = scrapfly.scrape(ScrapeConfig(

url="the website you want to escrape",

asp=True,

Ā # we can also enable headless browsers to render web apps and javascript powered pages

render_js=True,

Ā # and set proxies by country like United States

country="US",

Ā # and proxy type like residential:

proxy_pool=ScrapeConfig.PUBLIC_RESIDENTIAL_POOL,

))print(result.scrape_result)
  • Using automated browsers (Selenium and Puppeteer)

Using automated browsers

Using automated browsers is another approach to bypassing Datadome. They are also known as strengthened headless browsers. Selenium and Puppeteer are two well-known automated browsers that were developed with automation in mind.

It goes without saying that a headless browser that has been enhanced to look like a genuine user's browser can do a better job by evading Datadome detection and protection since vanilla headless browsers expose their identity in their JS fingerprints, which DataDome can readily detect. Therefore, a Datadome bypass with these browsers would require either the stealth plugin for Puppeteer or an undetected chromedriver for Selenium.
Here is how you could use Selenium's undetected-chromedriver to scrape a DataDome-protected website, assuming once more that your goal is to scrape a webpage.

To begin, simply use pip to install the undetected-chromedriver package:

pip install undetected-chromedriver

After installing undetected-chromedriver, you can configure your scraper or bot to use it in place of the regular chromedriver by default.

import undetected_chromedriver as uc

driver = uc.Chrome()
driver.get('https://datadome.co/')

In the example below, you would need to load the undetected-chromedriver from seleniumwire rather than directly from the undetected-chromedriver package and put the proxy settings into the seleniumwire-options property of the chromedriver to enable the use of authenticated proxies.

import seleniumwire.undetected_chromedriver as uc

## Chrome Options
chrome_options = uc.ChromeOptions()

## Proxy Options
proxy_options = {
'proxy': {
'http': 'http://user:pass@ip:port',
'https': 'https://user:pass@ip:port',
'no_proxy': 'localhost,127.0.0.1'
}
}

## Create Chrome Driver
driver = uc.Chrome(
options=chrome_options,
seleniumwire_options=proxy_options
)

driver.get('https://datadome.co/')

The basic Selenium chromedriver exposes a lot of data, which an anti-bot program like Datadome can use to detect whether or not the user's browser is automated. The Selenium undetected-chromedriver strengthens the regular Selenium chromedriver by altering nearly all of the techniques anti-bot technologies can employ to detect Selenium. This makes it much more challenging for Datadome to locate and block you.

These browsers and their extensions may vary, but their underlying principles are the same. Inconsistencies in browser fingerprints are handled by the extensions, which also override browser JavaScript variables and eliminate global variables particular to automated browsers.

Additionally, in order to bypass Datadome, which has much more advanced IP address fingerprinting methods than some other anti-bot solutions, it is essential to use a fortified browser in conjunction with a residential or mobile proxy. Residential or mobile proxies are your best option in this situation, as was previously explained. However, they might be pricey.

  • Using CAPTCHA solvers

Using CAPTCHA solvers

Captcha-solving services can be really helpful for various purposes, most especially for bypassing anti-bot detection tools like Datadome. Datadome also uses CAPTCHA as one of the ways it proves that a user is not a bot. However, using CAPTCHA solvers, you can successfully solve Datadomeā€™s CAPTCHAs. These CAPTCHA solutions either come from automated solvers or human beings who solve them and provide answers for you.

While the latter is slow and expensive, automated CAPTCHA solvers are fast, scalable, and able to support different renderings without compromising on speed. This is because they are based on machine learning techniques like optical character recognition (OCR).

There are several captcha-solving services out there that can help solve Datadome CAPTCHA, like 2Captcha, CapSolver, BypassCaptcha, AnyCaptcha, etc. One of these, CapSolver, can be used to solve the Datadome CAPTCHA with the following steps below.

First, in order to complete the procedure, you must first create the task using the createTask method and then get the result using the getTaskResult method. Note that this kind of task involves using your own proxies. Also, check to see if the captchaUrl's t parameter equals fe before proceeding. That is, if t=bv, it means that your IP address has been fully banned and you must update it.

So, below is an example of a request that has been solved with CapSolver:

POST https://api.capsolver.com/createTask

Host: api.capsolver.com

Content-Type: application/json


{

"clientKey": "YOUR_API_KEY",

"task": {

"type": "DatadomeSliderTask",

"websiteURL": "https://bck.websiteurl.com/registry",

"captchaUrl": "https://geo.captcha-delivery.com/captcha/?initialCid=AHrlqAAAAAMA1QGvUmJwyYoAwpyjNg%3D%3D&hash=789361B674144528D0B7EE76B35826&cid=6QAEcL8coBYTi9tYLmjCdyKmNNyHz1xwM2tMHHGVd_Rxr6FsWrb7H~a04csMptCPYfQ25CBDmaOZpdDa4qwAigFnsrzbCkVkoaBIXVAwHsjXJaKYXsTpkBPtqJfLMGN&t=fe&referer=https%3A%2F%2bck.websiteurl.com%2Fclient%2Fregister%2FYM4HJV%3Flang%3Den&s=40070&e=3e531bd3b30650f2e810ac72cd80adb5eaa68d2720e804314d122fa9e84ac25d",

//Required

"proxy": "socks5:158.120.100.23:334:user:pass",

//Required

"userAgent": "MODERN_USER_AGENT_HERE"

}

}

The response to look something like this:

{

"errorId": 0,

"status": "idle",

"taskId": "61138bb6-19fb-11ec-a9c8-0242ac110006"

}

Use the getTaskResult function to retrieve the recognition results. You will receive the results between one and twenty seconds, depending on the demand on the system.

POST https://api.capsolver.com/getTaskResult

Host: api.capsolver.com

Content-Type: application/json

{

"clientKey": "YOUR_API_KEY",

"taskId": "61138bb6-19fb-11ec-a9c8-0242ac110006"

}

The response should look like this




{

"errorId": 0,

"errorCode": null,

"errorDescription": null,

"solution": {

"userAgent": "",

"cookie": "datadome=yzj_BK...S0; Max-Age=31536000; Domain=.hermes.com; Path=/; Secure; SameSite=Lax"

},

"status": "ready"

}
  • Reverse Engineer DataDome's Anti-Bot Protection

Reverse engineering is the final method for bypassing Datadome's anti-bot protection and detection. It is one of the trickiest ways to get around Datadome without using a fully fortified headless browser. In a nutshell, you must first get through Datadome's CAPTCHA, as we previously explained.

Then, you would need to do reverse engineering on the Datadome network requests and JavaScript file. The network requests are always a good place to look to see how the JavaScript file is downloaded as well as other queries made to the servers of DataDome. You would need to open the developer tools and load a browser in order to reverse engineer this for a DataDome bypass.

Additionally, because the client-side security measures depend on the user's device running a script, they must be provided along with the secured website or application. These scripts use obfuscation to protect them and include proprietary code.

Reverse engineering can become more difficult and time-consuming as a result of these precautions, although it is still achievable. You can use an online Javascript deobfuscator such as DeObfuscate.io to tackle this.

However, you should be aware that some obfuscation approaches are difficult to undo and will require you to use manual methods in addition to automated tools.


FAQs

Q. Is it illegal to bypass Datadome?

Since Datadome is a bot protection tool, we can say that bypassing it may be regarded as illegal in some instances. For instance, if your aim in bypassing is malicious, it is definitely illegal. In either case, we caution you to exercise caution when trying to bypass Datadome because the repercussions of having your information exposed could be severe.

Q. Is bypassing Datadome an easy task?

Contrarily, bypassing Datadome requires careful consideration because there are several factors to take into account. To begin with, you would have to comprehend how Datadome operates in order to correctly navigate its detection methods. To put it briefly, doing a Datadome bypass requires a lot of effort and focus.

Q. What is the easiest technique for bypassing Datadome?

Realistically, there is no easier way to approach the Datadome bypass. Each of these methods comes with its own challenges and benefits. Most importantly, you would need to combine two or more of these techniques to effectively get off the web of the Datadome anti-bot protection system. However, having a good proxy would be a great place to start.

 

Conclusion

By now, you must have certainly realized how difficult it is to get past Datadome's anti-bot protections. This is mostly due to the different techniques it employs to distinguish between a bot and a real person visiting a website. It is, however, achievable.

To summarize, you should keep in mind that understanding how Datadome works is the key to knowing how to navigate a website that is secured by it. With that knowledge, you can bypass these techniques.

All of these are what we have covered in this article. While some of these bypass methods are relatively fast and simple, others are much more complex. Besides, each has its own unique benefits and drawbacks.

Popular Proxy Resources