Why Proxy Count Matters for Web Scraping
One of the first questions any scraping project faces is deceptively simple: how many proxies do I actually need? Use too few and your IPs get banned within minutes. Use too many and you waste budget on capacity you never touch. The right number depends on your target sites, request volume, rotation strategy, and tolerance for blocks.
This guide provides a practical calculation framework so you can size your proxy pool with confidence, whether you are scraping ten pages a day or ten million.
If you are new to scraping proxies, start with our Complete Guide to Web Scraping Proxies for foundational concepts.
The Core Formula
At its simplest, the number of concurrent IPs you need is:
required_ips = (requests_per_minute) / (safe_rpm_per_ip)
Where safe_rpm_per_ip is the maximum request rate a single IP can sustain on your target site without triggering blocks. This varies dramatically by target:
| Target Type | Safe RPM per IP | Notes |
|---|---|---|
| Small blogs / static sites | 20-60 | Minimal anti-bot |
| E-commerce (Shopify, WooCommerce) | 5-15 | Moderate rate limiting |
| Major platforms (Amazon, Google) | 1-5 | Aggressive detection |
| Social media (LinkedIn, Instagram) | 0.5-2 | Very strict enforcement |
Example Calculation
Suppose you need to scrape 50,000 product pages from an e-commerce site daily, completing the job within an 8-hour window:
# Target: 50,000 pages in 8 hours
requests_per_minute = 50000 / (8 * 60) # ≈ 104 RPM
safe_rpm_per_ip = 10 # e-commerce average
required_ips = 104 / 10 # ≈ 11 concurrent IPs
In practice, you should add a 30-50% buffer for retries, failures, and rate-limit cooldowns. So the realistic need is around 15-17 concurrent IPs.
Factors That Affect Your Proxy Requirements
1. Target Site Sophistication
Sites with advanced anti-bot systems require more IPs because each IP can make fewer requests before being flagged. Google, Amazon, and major social platforms invest heavily in fingerprinting and behavioral analysis. Budget for 3-5x more IPs than the base formula suggests when targeting these sites.
2. Request Volume and Frequency
Continuous scraping (24/7 monitoring) needs more IPs than batch jobs. If you run a daily batch, you can rotate through your pool aggressively during the window, then let IPs cool down. For real-time monitoring, every IP stays active longer, increasing your total requirement.
3. Geographic Distribution
If you need data from multiple regions (localized pricing, geo-specific search results), you need IPs in each target geography. A project scraping prices in 10 countries might need 15 IPs per country, meaning 150 total. Check the available ProxyHat locations to plan your geo-distribution.
4. Session vs Rotating Requirements
Some tasks (login flows, multi-page checkout analysis) require sticky sessions where the same IP persists for minutes. This ties up IPs longer, reducing effective pool utilization. Pure data collection with no session state can rotate on every request, using each IP more efficiently.
5. Residential vs Datacenter
Residential IPs have higher trust scores and can make more requests before bans, so you may need fewer of them. But they cost more per GB. Datacenter IPs are cheaper but get flagged faster, so you need a larger pool. For a deeper comparison, see Residential vs Datacenter vs Mobile Proxies.
Sizing Tables by Use Case
| Use Case | Daily Requests | Recommended IPs | Proxy Type |
|---|---|---|---|
| Small SEO audit (1 site) | 1,000-5,000 | 5-10 | Residential |
| Product price monitoring | 10,000-50,000 | 15-30 | Residential |
| SERP tracking (100 keywords) | 5,000-20,000 | 10-25 | Residential |
| E-commerce catalog scraping | 50,000-200,000 | 30-80 | Residential |
| Large-scale data aggregation | 500,000+ | 100-500+ | Residential rotating |
Calculating Total Bandwidth
Proxy count is one dimension; bandwidth is the other. Estimate your total data transfer:
# Average page sizes
static_page = 50 KB # HTML only
dynamic_page = 200 KB # HTML + JSON/API responses
full_render = 2-5 MB # with all assets (headless browser)
# Example: 50,000 pages/day × 200 KB average
daily_bandwidth = 50000 * 200 / 1024 / 1024 # ≈ 9.5 GB/day
This helps you choose the right ProxyHat plan based on both IP and bandwidth needs.
Implementation: Dynamic Pool Sizing
Rather than guessing statically, implement dynamic pool sizing that adapts to real-world conditions. Here is an example using the ProxyHat gateway with adaptive concurrency:
Python Example
import asyncio
import aiohttp
from dataclasses import dataclass, field
from time import time
@dataclass
class PoolSizer:
"""Dynamically adjusts concurrent proxy connections based on success rate."""
min_concurrent: int = 5
max_concurrent: int = 100
target_success_rate: float = 0.95
current_concurrent: int = 10
results: list = field(default_factory=list)
def record(self, success: bool):
self.results.append((time(), success))
# Keep only last 100 results
self.results = self.results[-100:]
@property
def success_rate(self) -> float:
if not self.results:
return 1.0
return sum(1 for _, s in self.results if s) / len(self.results)
def adjust(self):
rate = self.success_rate
if rate >= self.target_success_rate and self.current_concurrent < self.max_concurrent:
# Success rate is good — try more concurrency
self.current_concurrent = min(self.current_concurrent + 2, self.max_concurrent)
elif rate < self.target_success_rate * 0.9:
# Success rate dropping — reduce concurrency
self.current_concurrent = max(self.current_concurrent - 5, self.min_concurrent)
async def scrape_with_adaptive_pool(urls: list[str]):
sizer = PoolSizer()
proxy = "http://USERNAME:PASSWORD@gate.proxyhat.com:8080"
semaphore = asyncio.Semaphore(sizer.current_concurrent)
async with aiohttp.ClientSession() as session:
async def fetch(url):
async with semaphore:
try:
async with session.get(url, proxy=proxy, timeout=aiohttp.ClientTimeout(total=30)) as resp:
success = resp.status == 200
sizer.record(success)
return await resp.text() if success else None
except Exception:
sizer.record(False)
return None
for batch_start in range(0, len(urls), sizer.current_concurrent):
batch = urls[batch_start:batch_start + sizer.current_concurrent]
await asyncio.gather(*[fetch(url) for url in batch])
sizer.adjust()
# Update semaphore for next batch
semaphore = asyncio.Semaphore(sizer.current_concurrent)
print(f"Concurrent IPs: {sizer.current_concurrent}, Success rate: {sizer.success_rate:.1%}")
For production use, the ProxyHat Python SDK handles connection pooling and rotation automatically.
Node.js Example
const HttpsProxyAgent = require('https-proxy-agent');
const fetch = require('node-fetch');
class AdaptivePoolSizer {
constructor(min = 5, max = 100) {
this.min = min;
this.max = max;
this.current = 10;
this.results = [];
this.targetRate = 0.95;
}
record(success) {
this.results.push({ time: Date.now(), success });
if (this.results.length > 100) this.results = this.results.slice(-100);
}
get successRate() {
if (!this.results.length) return 1;
return this.results.filter(r => r.success).length / this.results.length;
}
adjust() {
if (this.successRate >= this.targetRate && this.current < this.max) {
this.current = Math.min(this.current + 2, this.max);
} else if (this.successRate < this.targetRate * 0.9) {
this.current = Math.max(this.current - 5, this.min);
}
}
}
async function scrapeWithAdaptivePool(urls) {
const sizer = new AdaptivePoolSizer();
const agent = new HttpsProxyAgent('http://USERNAME:PASSWORD@gate.proxyhat.com:8080');
for (let i = 0; i < urls.length; i += sizer.current) {
const batch = urls.slice(i, i + sizer.current);
const results = await Promise.allSettled(
batch.map(url =>
fetch(url, { agent, timeout: 30000 })
.then(res => { sizer.record(res.ok); return res.text(); })
.catch(() => { sizer.record(false); return null; })
)
);
sizer.adjust();
console.log(`Concurrent: ${sizer.current}, Success: ${(sizer.successRate * 100).toFixed(1)}%`);
}
}
Common Mistakes When Sizing Proxy Pools
- Using the same count for all targets. A pool that works for static blogs will fail on Amazon. Always benchmark per target.
- Ignoring retry overhead. Failed requests consume bandwidth and time. Factor in a 20-40% retry rate for aggressive targets.
- Not accounting for session requirements. If you need sticky sessions for login flows, each session ties up an IP. Calculate based on concurrent sessions, not just request rate.
- Forgetting geographic needs. Ten IPs in the US will not help you scrape localized results in Japan. Plan per geography.
- Over-provisioning "just in case." With rotating residential proxies like ProxyHat, you access a large pool automatically. You pay for bandwidth, not for the number of IPs in the pool. Focus on choosing the right proxy type rather than hoarding IPs.
ProxyHat Advantage: Pool Management Simplified
With ProxyHat's rotating residential proxy gateway, you do not need to manually manage a list of IPs. Every request through gate.proxyhat.com automatically receives a fresh IP from a pool of millions. This means:
- No manual IP list management
- Automatic rotation on every request (or sticky sessions when needed)
- Access to IPs in 190+ countries
- Pay for bandwidth used, not per-IP fees
Your "proxy count" effectively becomes your concurrency level — how many simultaneous connections you run through the gateway. Start with the formulas above, then let the adaptive sizing code fine-tune it in production.
For a complete walkthrough of scraping architecture with proxies, see our Complete Guide to Web Scraping Proxies. To learn about rotation strategies that complement your pool sizing, read How to Scrape Websites Without Getting Blocked.
Frequently Asked Questions
How many proxies do I need for small-scale scraping?
For small projects under 5,000 requests per day targeting moderately protected sites, 5-10 concurrent residential proxies are typically sufficient. With a rotating gateway like ProxyHat, you simply set your concurrency level to 5-10 and the system handles IP assignment.
Do I need more proxies for JavaScript-heavy sites?
Yes. Headless browser scraping is slower per request (2-10 seconds vs 0.5-1 second for HTML-only), which means each concurrent slot processes fewer requests. You may need 2-3x the concurrency to maintain the same throughput. See our guide on avoiding blocks for optimization tips.
Should I use residential or datacenter proxies?
For most scraping tasks, residential proxies offer higher success rates and require fewer concurrent connections. Datacenter proxies are cheaper per GB but get blocked faster, requiring a larger pool. Read our proxy type comparison for detailed guidance.
How does ProxyHat's rotating pool work?
Each request through ProxyHat's gateway (gate.proxyhat.com:8080) is automatically assigned a different residential IP. You do not manage individual IPs — you control concurrency and the system handles rotation. This is more efficient than maintaining a static IP list.






