Why Raw Puppeteer Gets Caught Every Time
If you've ever fired up Puppeteer against a protected site and watched your request get blocked within seconds, you've hit the same wall every scraping engineer meets: automated browsers leak detectable signals. It doesn't matter how clever your selectors are if the site knows you're a bot before the page even loads.
The core problem is that Chromium — the engine Puppeteer drives — was designed for testing, not for blending in. When you launch it via the DevTools Protocol, it leaves a trail of artifacts that anti-bot systems are specifically trained to find.
The Big Three Detection Signals
Here are the most reliable tells that expose a raw Puppeteer session:
- navigator.webdriver — Set to
truein any Chromium instance launched via WebDriver or CDP. Cloudflare, DataDome, and Akamai all check this property first. - Inconsistent plugins and mimeTypes arrays — Headless Chromium reports an empty
navigator.pluginsarray, while a real Chrome browser lists PDF Viewer, Chrome PDF Viewer, and others. This mismatch is trivially detectable. - iframe chromedriver artifacts — The CDP runtime injects
__nightmare,cdc_-prefixed variables, and internal iframe references that have no equivalent in a human-driven browser.
But those three are just the start. Modern anti-bot systems also check WebGL renderer strings, canvas fingerprint consistency, navigator.languages ordering, window.chrome object presence, User-Agent vs. navigator.platform mismatches, and timing-based behavioral signals. Raw Puppeteer fails on most of these out of the box.
Puppeteer-Extra with Stealth Plugin: What It Actually Patches
The puppeteer-extra-plugin-stealth is a collection of evasion modules — each one targeting a specific detection vector. It's not magic; it's a stack of carefully ordered interceptors that run before any page script executes.
Here's what the stealth plugin covers and what it doesn't:
| Detection Signal | Stealth Patch | Notes |
|---|---|---|
navigator.webdriver | Yes — set to undefined | Most critical patch |
navigator.plugins | Yes — populated with realistic entries | MimeTypes also aligned |
window.chrome object | Yes — added with expected properties | Missing in headless by default |
| WebGL vendor/renderer | Partial — spoofed to common values | May need custom override for niche sites |
| Canvas fingerprint | No — not randomized by default | Requires custom evaluator (see below) |
CDP artifacts / cdc_ vars | Yes — removed from iframe contentWindow | Also strips __nightmare |
| Permissions API | Yes — overrides navigator.permissions.query | Prevents headless detection via permissions |
Iframe contentWindow consistency | Yes — patches cross-origin discrepancies | Prevents iframe-based detection |
| User-Agent consistency | Partial — depends on your UA string | You must set a realistic UA yourself |
The stealth plugin handles the structural signals well, but it doesn't touch fingerprint entropy. Two stealth-enabled browsers on the same machine will produce identical canvas and WebGL fingerprints — a dead giveaway if a site correlates sessions. That's why you need custom evaluators and proxy rotation to build a truly robust stack.
Setting Up Puppeteer-Extra Stealth with Proxies
Let's build the foundation: a stealth-enabled browser that routes traffic through ProxyHat residential proxies with geo-targeting.
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());
async function createStealthBrowser(proxyCountry = 'US') {
const proxyAuth = `user-country-${proxyCountry}:YOUR_PASSWORD`;
const proxyUrl = `http://${proxyAuth}@gate.proxyhat.com:8080`;
const browser = await puppeteer.launch({
headless: 'new',
args: [
`--proxy-server=${proxyUrl}`,
'--disable-blink-features=AutomationControlled',
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--disable-gpu',
],
});
return browser;
}
(async () => {
const browser = await createStealthBrowser('DE');
const page = await browser.newPage();
await page.setUserAgent(
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ' +
'(KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36'
);
await page.goto('https://bot.sannysoft.com/');
await page.screenshot({ path: 'stealth-check.png' });
await browser.close();
})();Key points in this setup:
- Proxy goes in
--proxy-serverlaunch arg, not in page-level settings. This ensures all connections — including subresource requests — go through the proxy. --disable-blink-features=AutomationControlleddisables thenavigator.webdriverflag at the Chromium level as a first line of defense, before the stealth plugin applies its runtime patches.- Geo-targeting in the username — ProxyHat uses the
user-country-XXformat to route your traffic through a residential IP in the specified country. This is critical for sites that serve different content by region.
Combining Stealth with Residential Proxies: The Anti-Detection Stack
Stealth patches browser signals. Residential proxies patch network signals. You need both because anti-bot systems check both layers:
- Network layer — Is the IP in a datacenter ASN? Does it match the claimed geo? Has it been flagged for bot activity? Residential proxies from ProxyHat's location pool solve this by providing IPs from real ISPs.
- Browser layer — Does the browser look automated? Are fingerprints consistent with a real user? The stealth plugin handles the structural signals; custom evaluators handle the entropy.
The combination is powerful because each layer covers the other's blind spots. A residential IP with a detectable browser still gets blocked. A stealth browser on a datacenter IP still gets flagged by IP reputation checks. Together, they present a consistent profile: a real residential user with a normal browser.
Sticky Sessions for Stateful Scraping
Some sites require login or multi-page flows. You need the same IP across multiple requests. ProxyHat supports sticky sessions via the username format:
// Sticky session: same IP for the session duration
const proxyAuth = `user-country-US-session-orderFlow42:YOUR_PASSWORD`;
const proxyUrl = `http://${proxyAuth}@gate.proxyhat.com:8080`;
// Use this proxy URL for all requests in the order flow
// The session ID keeps the IP stableWithout sticky sessions, each request may exit through a different residential IP — fine for SERP scraping, catastrophic for e-commerce checkout flows.
Custom Evaluators: Canvas and WebGL Fingerprint Randomization
This is where most Puppeteer anti-detection guides stop — and where production crawlers fail. The stealth plugin doesn't randomize canvas or WebGL fingerprints. If you launch 50 browser instances on the same machine, they all produce the same hash. Sophisticated anti-bot systems detect this correlation.
The solution: inject per-session noise into canvas rendering and WebGL parameters before any page script runs.
Canvas Fingerprint Randomization
Canvas fingerprinting works by drawing hidden text and shapes, then reading the pixel data via toDataURL(). Tiny differences in rendering — caused by GPU drivers, font rasterizers, and OS-level anti-aliasing — produce a unique hash. We simulate those differences by injecting deterministic noise per session.
function generateCanvasNoise(seed) {
// Simple seeded PRNG for deterministic per-session noise
let s = seed;
const rand = () => {
s = (s * 16807 + 0) % 2147483647;
return (s - 1) / 2147483646;
};
// Generate a small offset table for RGBA channels
const offsets = [];
for (let i = 0; i < 16; i++) {
offsets.push(Math.floor(rand() * 3) - 1); // -1, 0, or +1
}
return offsets;
}
async function injectCanvasRandomization(page, sessionId) {
const seed = hashCode(sessionId); // Convert session ID to numeric seed
const offsets = generateCanvasNoise(seed);
await page.evaluateOnNewDocument((noise) => {
const origToDataURL = HTMLCanvasElement.prototype.toDataURL;
HTMLCanvasElement.prototype.toDataURL = function (...args) {
const ctx = this.getContext('2d');
if (ctx) {
const imgData = ctx.getImageData(0, 0, this.width, this.height);
for (let i = 0; i < imgData.data.length && i < noise.length * 4; i += 4) {
imgData.data[i] = Math.max(0, Math.min(255, imgData.data[i] + noise[i/4 % noise.length]));
imgData.data[i + 1] = Math.max(0, Math.min(255, imgData.data[i + 1] + noise[(i/4+1) % noise.length]));
imgData.data[i + 2] = Math.max(0, Math.min(255, imgData.data[i + 2] + noise[(i/4+2) % noise.length]));
}
ctx.putImageData(imgData, 0, 0);
}
return origToDataURL.apply(this, args);
};
}, offsets);
}
function hashCode(str) {
let hash = 0;
for (let i = 0; i < str.length; i++) {
hash = ((hash << 5) - hash) + str.charCodeAt(i);
hash |= 0;
}
return Math.abs(hash);
}WebGL Fingerprint Randomization
WebGL fingerprinting reads the vendor and renderer strings from the GPU. We override these to match common consumer hardware profiles:
async function injectWebGLRandomization(page, profile) {
const profiles = {
nvidia: { vendor: 'Google Inc. (NVIDIA)', renderer: 'ANGLE (NVIDIA, NVIDIA GeForce GTX 1060, OpenGL 4.5)' },
amd: { vendor: 'Google Inc. (AMD)', renderer: 'ANGLE (AMD, AMD Radeon RX 580, OpenGL 4.5)' },
intel: { vendor: 'Google Inc. (Intel)', renderer: 'ANGLE (Intel, Intel(R) UHD Graphics 630, OpenGL 4.5)' },
};
const p = profiles[profile] || profiles.nvidia;
await page.evaluateOnNewDocument((webglProfile) => {
const getParameter = WebGLRenderingContext.prototype.getParameter;
WebGLRenderingContext.prototype.getParameter = function (param) {
if (param === 37445) return webglProfile.vendor; // UNMASKED_VENDOR_WEBGL
if (param === 37446) return webglProfile.renderer; // UNMASKED_RENDERER_WEBGL
return getParameter.call(this, param);
};
// Same for WebGL2
if (typeof WebGL2RenderingContext !== 'undefined') {
const getParam2 = WebGL2RenderingContext.prototype.getParameter;
WebGL2RenderingContext.prototype.getParameter = function (param) {
if (param === 37445) return webglProfile.vendor;
if (param === 37446) return webglProfile.renderer;
return getParam2.call(this, param);
};
}
}, p);
}Assign a random profile per session to avoid correlation. Pair each profile with a matching User-Agent and viewport — an Intel GPU profile should come with a laptop-like viewport (1366×768 or 1920×1080), not a 4K ultrawide resolution.
Per-Browser-Context Proxy Rotation
Puppeteer's browser.newPage() creates pages that share the browser's proxy settings. But for true per-session isolation — different IPs, different fingerprints, different cookies — you need per-context proxy assignment.
Chromium supports per-context proxies via the --proxy-server flag combined with CDP's Fetch.enable with per-request auth, but the cleanest approach is to launch a separate browser instance per proxy when you need full isolation, or use browser.createIncognitoBrowserContext() with page-level proxy auth for lighter-weight rotation.
For production crawlers, the recommended pattern is a browser pool where each worker gets its own browser with a dedicated proxy:
class StealthBrowserPool {
constructor({ size, proxyConfig, fingerprintProfiles }) {
this.size = size;
this.proxyConfig = proxyConfig;
this.profiles = fingerprintProfiles;
this.pool = [];
this.available = [];
}
async init() {
for (let i = 0; i < this.size; i++) {
const worker = await this._createWorker(i);
this.pool.push(worker);
this.available.push(i);
}
}
async _createWorker(index) {
const country = this.proxyConfig.countries[index % this.proxyConfig.countries.length];
const profile = this.profiles[index % this.profiles.length];
const sessionId = `worker-${index}-${Date.now()}`;
const proxyAuth =
`user-country-${country}-session-${sessionId}:${this.proxyConfig.password}`;
const proxyUrl = `http://${proxyAuth}@gate.proxyhat.com:8080`;
const browser = await puppeteer.launch({
headless: 'new',
args: [
`--proxy-server=${proxyUrl}`,
'--disable-blink-features=AutomationControlled',
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
],
});
const page = await browser.newPage();
await page.setUserAgent(profile.userAgent);
await page.setViewport(profile.viewport);
await injectCanvasRandomization(page, sessionId);
await injectWebGLRandomization(page, profile.gpuProfile);
return { browser, page, sessionId, country, proxyUrl };
}
async acquire() {
if (this.available.length === 0) {
throw new Error('Pool exhausted — increase size or implement queuing');
}
const index = this.available.shift();
return { index, ...this.pool[index] };
}
async release(index) {
// Reset the page state for reuse
const worker = this.pool[index];
try {
const client = await worker.page.target().createCDPSession();
await client.send('Network.clearBrowserCache');
await client.send('Network.clearBrowserCookies');
} catch (e) {
// Context may have been closed; recreate
this.pool[index] = await this._createWorker(index);
}
this.available.push(index);
}
async close() {
await Promise.all(this.pool.map(w => w.browser.close()));
}
}
// Usage
const pool = new StealthBrowserPool({
size: 10,
proxyConfig: {
countries: ['US', 'DE', 'GB', 'FR'],
password: 'YOUR_PASSWORD',
},
fingerprintProfiles: [
{ userAgent: '...', viewport: { width: 1920, height: 1080 }, gpuProfile: 'nvidia' },
{ userAgent: '...', viewport: { width: 1366, height: 768 }, gpuProfile: 'intel' },
{ userAgent: '...', viewport: { width: 1536, height: 864 }, gpuProfile: 'amd' },
],
});
await pool.init();
const worker = await pool.acquire();
try {
await worker.page.goto('https://example.com');
// ... scrape logic ...
} finally {
await pool.release(worker.index);
}This pool pattern gives you full isolation: each worker has its own IP, its own fingerprint, and its own cookie jar. When a worker is released, cookies and cache are cleared so the next task starts clean.
Scaling: Containerized Fleets and Resource Management
Running 10 browsers on one machine is manageable. Running 500 requires infrastructure. Here's how to think about scaling a Puppeteer stealth fleet.
Resource Budgeting
Each Chromium instance consumes roughly:
- 150–300 MB RAM per page (more for heavy SPAs)
- 0.5–1.0 CPU cores under active load
- Network bandwidth depends on page weight — budget 2–5 MB per page load
On a 16-core, 64 GB machine with headless Chromium, you can realistically run 40–80 concurrent browsers. Beyond that, you need horizontal scaling.
Container Architecture
Use a worker container pattern where each container runs a small pool of browsers and exposes a job API:
# docker-compose.yml — scaled worker fleet
version: '3.8'
services:
worker:
build: ./worker
deploy:
replicas: 10
resources:
limits:
cpus: '2.0'
memory: 4G
reservations:
cpus: '1.0'
memory: 2G
environment:
- PROXYHAT_USER=user-country-US
- PROXYHAT_PASS=YOUR_PASSWORD
- POOL_SIZE=8
- REDIS_URL=redis://queue:6379
depends_on:
- queue
queue:
image: redis:7-alpine
ports:
- '6379:6379'
orchestrator:
build: ./orchestrator
environment:
- REDIS_URL=redis://queue:6379
- WORKER_COUNT=10
depends_on:
- queueThe orchestrator pushes URLs to a Redis queue. Each worker pulls jobs, acquires a browser from its local pool, executes the crawl, and pushes results to an output queue. This architecture is stateless at the worker level — if a container crashes, the orchestrator simply re-queues the job.
Browser Lifecycle Management
Browsers are not long-lived. Memory leaks in Chromium accumulate, and anti-bot systems may start flagging an IP after too many requests. Implement a rotation policy:
- Max requests per browser: 50–100 before restarting
- Max lifetime: 10–15 minutes before restarting
- On detection: immediately rotate the proxy (new session ID) and restart the browser
Build a health check into your pool that monitors memory usage and restarts workers proactively:
async function healthCheck(pool) {
for (const worker of pool.pool) {
try {
const metrics = await worker.page.metrics();
const mem = process.memoryUsage();
if (mem.heapUsed > 500 * 1024 * 1024 || worker.requestCount > 80) {
console.log(`Recycling worker ${worker.sessionId}`);
await worker.browser.close();
// Recreate with fresh proxy session
const idx = pool.pool.indexOf(worker);
pool.pool[idx] = await pool._createWorker(idx);
}
} catch (e) {
// Browser already dead — recreate
const idx = pool.pool.indexOf(worker);
pool.pool[idx] = await pool._createWorker(idx);
}
}
}
setInterval(() => healthCheck(pool), 60_000);Concurrency vs. Politeness
More browsers ≠ more data. Aggressive concurrency triggers rate limits and CAPTCHAs. A practical rule of thumb: 1 request per second per domain per IP. If you have 50 residential IPs targeting one domain, you can sustain ~50 requests/second. Push beyond that and you'll hit behavioral detection regardless of your stealth setup.
For SERP tracking at scale, stagger your requests across the proxy pool and add jitter:
async function staggeredCrawl(urls, pool) {
const results = [];
const concurrency = pool.size;
for (let i = 0; i < urls.length; i += concurrency) {
const batch = urls.slice(i, i + concurrency);
const promises = batch.map((url, j) => (async () => {
// Add random jitter: 0–2000ms
await new Promise(r => setTimeout(r, Math.random() * 2000));
const worker = await pool.acquire();
try {
await worker.page.goto(url, { waitUntil: 'networkidle2', timeout: 30000 });
return await worker.page.evaluate(() => document.title);
} finally {
await pool.release(worker.index);
}
})());
const batchResults = await Promise.allSettled(promises);
results.push(...batchResults);
// Cooldown between batches
await new Promise(r => setTimeout(r, 1000));
}
return results;
}When Stealth Isn't Enough: CAPTCHAs and Behavioral Checks
Even with the full stack — stealth plugin, residential proxies, fingerprint randomization — some sites will still challenge you. This typically happens when:
- The site uses advanced behavioral analysis (mouse movement patterns, scroll depth, typing cadence)
- You're hitting the same endpoint at superhuman speed
- Your proxy IP has been burned by other users
Mitigation strategies:
- Use residential proxies with city-level targeting —
user-country-US-city-newyork— for local-service sites that validate IP geolocation against the claimed location - Add realistic interaction delays — don't just
goto()and scrape; scroll, hover, wait for images to load - Rotate proxy sessions proactively — don't wait for a block; change your session ID every 50 requests
- Use mobile proxies for mobile-optimized sites — mobile user agents on residential mobile IPs are less scrutinized than desktop datacenter traffic. ProxyHat's mobile proxies are available via
socks5://USERNAME:PASSWORD@gate.proxyhat.com:1080
Ethical Boundaries: Stealth for Legitimate Scraping
Stealth technology is a tool, not a license to bypass every gate. There are clear lines between legitimate scraping and abuse:
Legitimate use: collecting publicly available data at reasonable rates, monitoring your own brand's SERP positions, aggregating pricing data from e-commerce sites for comparison tools, academic research on public web content.
Abuse: bypassing authentication to access private data, circumventing rate limits to DDoS a service, creating fake accounts at scale, committing ad fraud or credential stuffing.
Practical guidelines for ethical stealth scraping:
- Respect
robots.txt— if a page is disallowed, don't scrape it - Honor rate limits — if a site returns 429, back off instead of rotating IPs to hammer harder
- Comply with GDPR and CCPA — don't collect personal data without a legal basis
- Check terms of service — some sites explicitly prohibit scraping; violating ToS can have legal consequences
- Be transparent when possible — if you can identify your bot in the User-Agent without getting blocked, do so
Stealth is a defensive measure against overly aggressive bot detection that blocks legitimate automated access. It's not a tool to access things you shouldn't.
Key Takeaways
- Raw Puppeteer is trivially detectable —
navigator.webdriver, empty plugins, and CDP artifacts expose automation immediately. - puppeteer-extra-plugin-stealth patches structural signals but doesn't randomize fingerprints — you need custom evaluators for canvas and WebGL.
- Residential proxies + stealth is the strongest stack — network-layer and browser-layer detection are independent problems that require independent solutions.
- Per-session fingerprint isolation prevents correlation — each browser instance should have a unique canvas noise seed, WebGL profile, viewport, and User-Agent.
- Browser pools with dedicated proxies enable safe concurrency — don't share IPs or cookies across sessions.
- Scale with containers and queues — stateless workers pulling from Redis, with health checks that recycle browsers before they leak or get flagged.
- Stealth is for legitimate scraping — respect robots.txt, rate limits, and privacy regulations.
Ready to build your anti-detection stack? Check out ProxyHat's residential proxy plans for geo-targeted IPs that pair perfectly with puppeteer-extra stealth, or explore our web scraping use case for more implementation patterns.






