Why DataDome Stops Most Scrapers Cold
If you've built a scraper and suddenly hit a blank page with a datadome cookie challenge, you already know the frustration. DataDome is one of the most sophisticated bot-detection platforms deployed on major e-commerce, media, and travel sites. It doesn't just look at your IP — it examines your TLS handshake, your browser's JavaScript surface, and how you behave on the page.
This article breaks down exactly how DataDome's detection stack works and how legitimate automation — security research, authorized pentesting, SEO monitoring — can pass cleanly without resorting to CAPTCHA solvers or other ethically dubious workarounds.
DataDome's Detection Stack — The Four Layers
DataDome operates across four overlapping layers. Failing any single layer can trigger a challenge or block. Understanding each layer is the first step to configuring your automation to pass naturally.
Layer 1: IP Reputation and ASN Analysis
DataDome maintains and licenses commercial IP reputation databases, cross-referenced with their own real-time threat intelligence. They classify IPs across several dimensions:
- ASN type: Hosting providers (OVH, DigitalOcean, Hetzner, AWS) are flagged almost immediately. Datacenter ASNs are a strong negative signal.
- Residential vs. datacenter: IPs registered to ISPs (Comcast, Orange, Deutsche Telekom) carry far more trust than cloud-hosted ranges.
- Mobile carriers: IPs from mobile ASNs (T-Mobile, Vodafone) receive the highest trust scores — mobile traffic is inherently harder to automate at scale.
- Historical abuse: Even residential IPs that have been used in botnets or proxy networks accumulate negative reputation over time.
- Geo-consistency: An IP from Brazil hitting a French-localized page raises suspicion; a geo-matched IP does not.
This is why DataDome residential proxies matter — using a residential or mobile IP from a reputable ASN is the single biggest factor in whether your request even gets a chance to prove it's legitimate.
Layer 2: TLS Fingerprinting (JA3/JA4)
Before a single byte of HTTP is exchanged, DataDome's edge servers analyze your TLS ClientHello. The JA3 fingerprint is a hash of:
- TLS version
- Cipher suites (in the exact order sent)
- Extensions
- Elliptic curves
- Elliptic curve point formats
The newer JA4 fingerprint extends this with ALPN, compressed SNI, and other extensions. DataDome maintains a database of known JA3/JA4 hashes mapped to client types:
- Chrome 124 on Windows 11 → specific JA3 hash, specific cipher ordering
- Python
requestslibrary → different JA3 hash, different cipher list, immediately flagged curlwith default OpenSSL → yet another hash, also flagged
Common detection patterns:
- Cipher ordering anomalies: Chrome always orders
TLS_AES_128_GCM_SHA256beforeTLS_AES_256_GCM_SHA384. A reversed order is a dead giveaway. - Missing extensions: Real browsers send
application_layer_protocol_negotiation(ALPN) withh2,http/1.1. Many HTTP clients omit this. - GREASE values: Chrome includes GREASE (Generate Random Extensions And Sustain Extensibility) entries to prevent ossification. Their absence is suspicious.
- Extension ordering: Browsers have a specific, stable extension ordering. Randomized ordering signals a TLS impersonation tool.
Layer 3: Browser Fingerprinting
If your TLS fingerprint passes, DataDome's JavaScript payload executes in your browser and collects an extensive fingerprint:
- Canvas fingerprint: Renders text and shapes to a hidden
<canvas>, then reads back the pixel data. The result varies by GPU, driver, font rendering, and anti-aliasing. Headless browsers often produce distinctive canvas hashes. - WebGL fingerprint: Queries renderer and vendor strings (
WebGLRenderingContext.getParameter). Mesa/llvmpipe (software rendering) is a strong headless indicator. - Audio fingerprint: Uses
OfflineAudioContextto process a signal and reads back the result. Different audio stacks produce subtly different outputs. - Navigator properties:
navigator.userAgent,navigator.platform,navigator.hardwareConcurrency,navigator.deviceMemory,navigator.languages. Inconsistencies — like a user-agent claiming macOS whilenavigator.platformreportsWin32— are instant flags. - Screen and viewport: Screen dimensions, color depth, pixel ratio. A viewport of
800×600with pixel ratio1on a claim of a Retina display is inconsistent. - Font enumeration: Detects installed fonts via rendering-time side channels. Headless environments typically have fewer fonts.
- Feature detection: Checks for
window.chrome,window.safari, specific CSS media queries, and other browser-specific globals.
Layer 4: Behavioral Signals
Even with a perfect fingerprint, DataDome watches how you interact with the page:
- Mouse dynamics: Real mice produce curved, accelerating paths with micro-jitters. A straight line from point A to B with constant velocity is a bot signature.
- Scroll patterns: Natural scrolling has variable speed, occasional pauses, and overshoot. Programmatic
scrollTocalls are trivially detectable. - Click timing: Clicks that always happen exactly 500ms after page load, or at perfectly regular intervals, raise suspicion.
- Keyboard dynamics: Typing cadence, key hold times, and error patterns. Copy-paste into form fields has a different event signature than manual typing.
- Page engagement: Time on page, whether the tab is in focus (
document.visibilityState), and whether the user scrolls below the fold.
The DataDome Cookie and CAPTCHA Challenge Flow
When a visitor first hits a DataDome-protected site, the flow looks like this:
- Initial request hits DataDome's edge (often a reverse proxy or CDN integration).
- Server-side checks: IP reputation, ASN, geolocation, and TLS fingerprint are evaluated immediately — before any response is sent.
- Three possible outcomes:
- Pass: A
datadomecookie is set, and the request is forwarded to the origin. The cookie has a TTL (typically 24 hours) and is tied to the IP+fingerprint combination. - Challenge: A JavaScript challenge page is returned. The browser must execute the JS payload, which collects the full browser fingerprint and sends it back. If the fingerprint passes, the
datadomecookie is set. - CAPTCHA: For high-risk signals, DataDome serves a CAPTCHA (typically a visual puzzle). The user must solve it to receive the cookie.
- Pass: A
- Subsequent requests: The
datadomecookie is sent along with every request. DataDome validates it server-side. If the cookie is valid and matches the current IP+fingerprint, the request passes without re-challenge. - Cookie invalidation: Changing your IP, switching browsers, or modifying your fingerprint invalidates the cookie, triggering a new challenge.
This means that even if you solve a CAPTCHA once, you can't reuse that cookie from a different IP or browser profile. The cookie is context-bound.
Why Residential and Mobile Proxies Matter
DataDome's IP reputation system is aggressive about datacenter ranges. This isn't arbitrary — the vast majority of automated abuse originates from cloud hosting providers. DataDome maintains lists of:
- Known cloud/VPN ASN ranges (AWS, Azure, GCP, OVH, Hetzner, DigitalOcean, Vultr, Linode, etc.)
- Known proxy/VPN exit nodes (from services like Luminati/Bright Data, Oxylabs, etc., when detected)
- Known botnet IPs
- Tor exit nodes
| Proxy Type | DataDome Trust Level | Typical Outcome |
|---|---|---|
| Datacenter (AWS, OVH, etc.) | Very Low | Immediate block or CAPTCHA |
| Shared residential (proxy network) | Medium — depends on IP reputation | May pass, may trigger challenge |
| Private residential (ISP, not shared) | High | Passes IP layer cleanly |
| Mobile carrier (4G/5G) | Very High | Best trust score; rarely challenged at IP level |
This is why choosing the right proxy type is critical. A datacenter IP with a perfect browser fingerprint will still get challenged because the IP reputation check happens first. Residential and mobile proxies from reputable ISPs are essential for passing DataDome's IP layer.
Geo-matching also matters. If you're scraping a French e-commerce site, using a French residential IP means your request looks like a normal French shopper. A Brazilian residential IP hitting the same site is unusual and raises a signal.
What Legitimate Automation Looks Like
Let's be clear about the ethical framing: DataDome exists to protect websites from abuse — credential stuffing, scalping, content theft, DDoS. If you're doing legitimate work (security research, authorized pentesting, SEO monitoring, price comparison for your own platform), your goal is not to "bypass DataDome" in the sense of defeating its protections. Your goal is to make your automation look like what it is: a real browser, operated by a real person, from a real network.
The principles:
- Unmodified browser: Use a real browser engine (Chromium via Playwright or Puppeteer) with stealth patches — not raw HTTP clients like
requestsoraxios. - Residential proxy, geo-matched: Use a residential or mobile IP from the same country (ideally same city) as the target audience.
- Human-paced interaction: Add realistic delays, scroll the page, move the mouse. Don't fire 50 requests per second.
- Respect rate limits: If a site serves 10 requests/second to a human, don't try 100. Throttle to human speed.
- Don't use CAPTCHA solvers: If you hit a CAPTCHA, it means the site doesn't want automated access. Either slow down, switch IP, or — better yet — use the official API if one exists.
Code Example 1: Playwright Stealth with Residential Proxy (Python)
This example uses playwright with the playwright-stealth plugin and a ProxyHat residential proxy geo-matched to the US:
import asyncio
from playwright.async_api import async_playwright
from playwright_stealth import stealth_async
PROXY_URL = "http://user-country-US:PASSWORD@gate.proxyhat.com:8080"
TARGET_URL = "https://example-datadome-site.com"
async def main():
async with async_playwright() as p:
browser = await p.chromium.launch(
proxy={"server": PROXY_URL},
headless=True,
args=[
"--disable-blink-features=AutomationControlled",
"--no-sandbox",
],
)
context = await browser.new_context(
viewport={"width": 1920, "height": 1080},
user_agent=(
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/124.0.0.0 Safari/537.36"
),
locale="en-US",
timezone_id="America/New_York",
geolocation={"latitude": 40.7128, "longitude": -74.0060},
permissions=["geolocation"],
)
page = await context.new_page()
await stealth_async(page)
# Navigate with realistic human timing
await page.goto(TARGET_URL, wait_until="domcontentloaded")
await asyncio.sleep(2) # Let DataDome JS execute
# Simulate human scroll behavior
await page.evaluate("window.scrollBy(0, 300)")
await asyncio.sleep(1)
await page.evaluate("window.scrollBy(0, 500)")
# Extract data
title = await page.title()
print(f"Page title: {title}")
await browser.close()
asyncio.run(main())
Code Example 2: Puppeteer Extra with Stealth Plugin (Node.js)
For Node.js users, puppeteer-extra with puppeteer-extra-plugin-stealth is the standard approach:
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());
const PROXY_URL = 'http://user-country-US:PASSWORD@gate.proxyhat.com:8080';
const TARGET_URL = 'https://example-datadome-site.com';
(async () => {
const browser = await puppeteer.launch({
headless: 'new',
args: [
`--proxy-server=${PROXY_URL}`,
'--disable-blink-features=AutomationControlled',
'--no-sandbox',
],
});
const page = await browser.newPage();
await page.setViewport({ width: 1920, height: 1080 });
// Override navigator properties for consistency
await page.evaluateOnNewDocument(() => {
Object.defineProperty(navigator, 'platform', {
get: () => 'Win32',
});
Object.defineProperty(navigator, 'hardwareConcurrency', {
get: () => 8,
});
Object.defineProperty(navigator, 'deviceMemory', {
get: () => 8,
});
});
await page.goto(TARGET_URL, { waitUntil: 'domcontentloaded' });
// Human-like mouse movement
await page.mouse.move(500, 300, { steps: 25 });
await page.mouse.move(800, 450, { steps: 15 });
// Human-like scroll
await page.evaluate(() => window.scrollBy(0, 400));
await new Promise(r => setTimeout(r, 1500));
const title = await page.title();
console.log('Page title:', title);
await browser.close();
})();
Code Example 3: cURL with TLS Impersonation and Residential Proxy
For API-level requests where you've already obtained a datadome cookie from a real browser session, you can reuse it with curl — but you need TLS impersonation to avoid JA3 fingerprinting. The curl-impersonate project provides this:
# Using curl-impersonate to mimic Chrome's TLS fingerprint
# with a ProxyHat residential proxy and an existing datadome cookie
curl_chrome124 \
--proxy 'http://user-country-US:PASSWORD@gate.proxyhat.com:8080' \
-H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36' \
-H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' \
-H 'Accept-Language: en-US,en;q=0.9' \
-H 'Cookie: datadome=YOUR_DATADOME_COOKIE_VALUE' \
'https://example-datadome-site.com/api/data'
Note: curl-impersonate only handles the TLS layer. You still need a valid datadome cookie obtained from a real browser session, and the cookie must be tied to the same IP you're using.
Common Mistakes That Trigger DataDome
- Using
requestsoraxiosdirectly: These libraries have distinctive JA3 fingerprints and send no JavaScript execution context. DataDome blocks them immediately. - Headless Chrome without stealth patches: The default
headless: truemode setsnavigator.webdriver = true, uses software rendering (Mesa/llvmpipe in WebGL), and produces a distinctive canvas fingerprint. - Inconsistent fingerprint: A user-agent claiming macOS with
navigator.platform = Win32, or claiming a Retina display withdevicePixelRatio = 1. - Datacenter IPs: Even with a perfect browser, datacenter ASNs are flagged at the edge before your JavaScript ever runs.
- Reusing cookies across IPs: The
datadomecookie is bound to the IP and fingerprint that generated it. Switching IPs invalidates it. - Too-fast interaction: Clicking elements 50ms after page load, or scrolling at machine speed, triggers behavioral detection.
- Missing or wrong ALPN: HTTP/2 negotiation (
h2) is standard for real browsers. Forcing HTTP/1.1 is a signal.
When DataDome Means "Use the Official API"
Sometimes, the right answer isn't better automation — it's a different access method. Many DataDome-protected sites offer official APIs:
- Major news publishers: Many offer RSS feeds or content APIs for legitimate aggregation. Use them.
- E-commerce platforms: Some have product data APIs or affiliate programs with structured data access.
- Travel and hospitality: Several have partner APIs for pricing data.
If a site has invested in DataDome and also offers an API, the API is almost always the better path. It's more reliable, faster, and doesn't require maintaining stealth browser infrastructure.
Check for:
/api,/v1,/v2endpoints in the site's documentation- Developer portals linked in footers
- Affiliate or partner programs that include data access
robots.txt— some sites disallow scraping but provide API alternatives
ProxyHat Configuration for DataDome-Protected Sites
When targeting DataDome-protected sites with ProxyHat, use these configuration principles:
- Use residential or mobile proxies: Datacenter proxies will be flagged immediately.
- Geo-match your proxy: Target a French site? Use a French IP. Target a US retailer? Use a US IP.
- Use sticky sessions: DataDome cookies are IP-bound. If your IP rotates mid-session, the cookie is invalidated. Use ProxyHat's session feature to maintain the same IP for the duration of your session.
- SOCKS5 for lower latency: If your use case supports it, SOCKS5 can offer slightly better performance.
Example: Sticky session with city-level geo-targeting for a German site:
# HTTP proxy with sticky session and city-level targeting
http://user-country-DE-city-berlin-session-mySession01:PASSWORD@gate.proxyhat.com:8080
# SOCKS5 proxy equivalent
socks5://user-country-DE-city-berlin-session-mySession01:PASSWORD@gate.proxyhat.com:1080
Ethical and Legal Considerations
A word on ethics and legality:
- DataDome protects against real abuse: Credential stuffing, scalping, DDoS, and content theft harm businesses and users. Don't contribute to that.
- Respect
robots.txt: If a site disallows scraping, respect that directive. There may be an API alternative. - Respect rate limits: Even with perfect stealth, sending 1,000 requests per minute from one IP is abusive. Slow down.
- GDPR and CCPA: If you're scraping personal data (names, emails, user profiles), you may be subject to data protection regulations. Ensure you have a legal basis.
- Terms of Service: Many sites' ToS explicitly prohibit automated access. Be aware of the legal risks.
- Authorized pentesting: If you're testing a site you own or have authorization to test, document that authorization.
Key Takeaways
- DataDome detects bots across four layers: IP reputation, TLS fingerprinting (JA3/JA4), browser fingerprinting (canvas, WebGL, audio, navigator), and behavioral signals (mouse, scroll, timing).
- Datacenter IPs are flagged at the edge — residential and mobile proxies are essential for passing the IP layer.
- TLS fingerprinting happens before any HTTP content is exchanged — use
curl-impersonateor a real browser engine, not rawrequests.- Browser fingerprinting detects inconsistencies between user-agent, platform, screen, and hardware properties.
- Behavioral detection catches non-human interaction patterns — use realistic mouse movements, scroll, and timing.
- The
datadomecookie is IP+fingerprint-bound — rotating IPs invalidates it.- When a site offers an official API, use it. It's more reliable and doesn't require stealth infrastructure.
- Legitimate automation respects rate limits, doesn't use CAPTCHA solvers, and operates within ethical and legal boundaries.
Conclusion
DataDome's detection stack is formidable because it's layered. You can't beat it with a single trick — you need clean IP reputation, proper TLS fingerprints, consistent browser fingerprints, and human-like behavior. The good news is that if your automation is genuinely legitimate, all of these properties are natural consequences of using a real browser with residential proxies.
Configure your ProxyHat residential proxies with geo-matching and sticky sessions, use Playwright or Puppeteer with stealth patches, add human-paced interaction, and respect the sites you're accessing. That's how legitimate automation passes cleanly through DataDome.
Ready to set up your residential proxy infrastructure? Explore ProxyHat's residential proxy plans or check our available locations to find geo-matched IPs for your target sites.






