Wie viele Proxies brauchen Sie für Scraping?

Ein praktischer Berechnungsrahmen zur Dimensionierung Ihres Proxy-Pools basierend auf Zielanzahl, Anfragevolumen, Rotationsbedarf und Budget. Mit Formeln und Dimensionierungstabellen.

Wie viele Proxies brauchen Sie für Scraping?

Why Proxy Count Matters for Web Scraping

One of the first questions any scraping project faces is deceptively simple: how many proxies do I actually need? Use too few and your IPs get banned within minutes. Use too many and you waste budget on capacity you never touch. The right number depends on your target sites, request volume, rotation strategy, and tolerance for blocks.

This guide provides a practical calculation framework so you can size your proxy pool with confidence, whether you are scraping ten pages a day or ten million.

If you are new to scraping proxies, start with our Complete Guide to Web Scraping Proxies for foundational concepts.

The Core Formula

At its simplest, the number of concurrent IPs you need is:

required_ips = (requests_per_minute) / (safe_rpm_per_ip)

Where safe_rpm_per_ip is the maximum request rate a single IP can sustain on your target site without triggering blocks. This varies dramatically by target:

Target TypeSafe RPM per IPNotes
Small blogs / static sites20-60Minimal anti-bot
E-commerce (Shopify, WooCommerce)5-15Moderate rate limiting
Major platforms (Amazon, Google)1-5Aggressive detection
Social media (LinkedIn, Instagram)0.5-2Very strict enforcement

Example Calculation

Suppose you need to scrape 50,000 product pages from an e-commerce site daily, completing the job within an 8-hour window:

# Target: 50,000 pages in 8 hours
requests_per_minute = 50000 / (8 * 60)  # ≈ 104 RPM
safe_rpm_per_ip = 10                     # e-commerce average
required_ips = 104 / 10  # ≈ 11 concurrent IPs

In practice, you should add a 30-50% buffer for retries, failures, and rate-limit cooldowns. So the realistic need is around 15-17 concurrent IPs.

Factors That Affect Your Proxy Requirements

1. Target Site Sophistication

Sites with advanced anti-bot systems require more IPs because each IP can make fewer requests before being flagged. Google, Amazon, and major social platforms invest heavily in fingerprinting and behavioral analysis. Budget for 3-5x more IPs than the base formula suggests when targeting these sites.

2. Request Volume and Frequency

Continuous scraping (24/7 monitoring) needs more IPs than batch jobs. If you run a daily batch, you can rotate through your pool aggressively during the window, then let IPs cool down. For real-time monitoring, every IP stays active longer, increasing your total requirement.

3. Geographic Distribution

If you need data from multiple regions (localized pricing, geo-specific search results), you need IPs in each target geography. A project scraping prices in 10 countries might need 15 IPs per country, meaning 150 total. Check the available ProxyHat locations to plan your geo-distribution.

4. Session vs Rotating Requirements

Some tasks (login flows, multi-page checkout analysis) require sticky sessions where the same IP persists for minutes. This ties up IPs longer, reducing effective pool utilization. Pure data collection with no session state can rotate on every request, using each IP more efficiently.

5. Residential vs Datacenter

Residential IPs have higher trust scores and can make more requests before bans, so you may need fewer of them. But they cost more per GB. Datacenter IPs are cheaper but get flagged faster, so you need a larger pool. For a deeper comparison, see Residential vs Datacenter vs Mobile Proxies.

Sizing Tables by Use Case

Use CaseDaily RequestsRecommended IPsProxy Type
Small SEO audit (1 site)1,000-5,0005-10Residential
Product price monitoring10,000-50,00015-30Residential
SERP tracking (100 keywords)5,000-20,00010-25Residential
E-commerce catalog scraping50,000-200,00030-80Residential
Large-scale data aggregation500,000+100-500+Residential rotating

Calculating Total Bandwidth

Proxy count is one dimension; bandwidth is the other. Estimate your total data transfer:

# Average page sizes
static_page = 50 KB      # HTML only
dynamic_page = 200 KB    # HTML + JSON/API responses
full_render = 2-5 MB     # with all assets (headless browser)
# Example: 50,000 pages/day × 200 KB average
daily_bandwidth = 50000 * 200 / 1024 / 1024  # ≈ 9.5 GB/day

This helps you choose the right ProxyHat plan based on both IP and bandwidth needs.

Implementation: Dynamic Pool Sizing

Rather than guessing statically, implement dynamic pool sizing that adapts to real-world conditions. Here is an example using the ProxyHat gateway with adaptive concurrency:

Python Example

import asyncio
import aiohttp
from dataclasses import dataclass, field
from time import time
@dataclass
class PoolSizer:
    """Dynamically adjusts concurrent proxy connections based on success rate."""
    min_concurrent: int = 5
    max_concurrent: int = 100
    target_success_rate: float = 0.95
    current_concurrent: int = 10
    results: list = field(default_factory=list)
    def record(self, success: bool):
        self.results.append((time(), success))
        # Keep only last 100 results
        self.results = self.results[-100:]
    @property
    def success_rate(self) -> float:
        if not self.results:
            return 1.0
        return sum(1 for _, s in self.results if s) / len(self.results)
    def adjust(self):
        rate = self.success_rate
        if rate >= self.target_success_rate and self.current_concurrent < self.max_concurrent:
            # Success rate is good — try more concurrency
            self.current_concurrent = min(self.current_concurrent + 2, self.max_concurrent)
        elif rate < self.target_success_rate * 0.9:
            # Success rate dropping — reduce concurrency
            self.current_concurrent = max(self.current_concurrent - 5, self.min_concurrent)
async def scrape_with_adaptive_pool(urls: list[str]):
    sizer = PoolSizer()
    proxy = "http://USERNAME:PASSWORD@gate.proxyhat.com:8080"
    semaphore = asyncio.Semaphore(sizer.current_concurrent)
    async with aiohttp.ClientSession() as session:
        async def fetch(url):
            async with semaphore:
                try:
                    async with session.get(url, proxy=proxy, timeout=aiohttp.ClientTimeout(total=30)) as resp:
                        success = resp.status == 200
                        sizer.record(success)
                        return await resp.text() if success else None
                except Exception:
                    sizer.record(False)
                    return None
        for batch_start in range(0, len(urls), sizer.current_concurrent):
            batch = urls[batch_start:batch_start + sizer.current_concurrent]
            await asyncio.gather(*[fetch(url) for url in batch])
            sizer.adjust()
            # Update semaphore for next batch
            semaphore = asyncio.Semaphore(sizer.current_concurrent)
            print(f"Concurrent IPs: {sizer.current_concurrent}, Success rate: {sizer.success_rate:.1%}")

For production use, the ProxyHat Python SDK handles connection pooling and rotation automatically.

Node.js Example

const HttpsProxyAgent = require('https-proxy-agent');
const fetch = require('node-fetch');
class AdaptivePoolSizer {
  constructor(min = 5, max = 100) {
    this.min = min;
    this.max = max;
    this.current = 10;
    this.results = [];
    this.targetRate = 0.95;
  }
  record(success) {
    this.results.push({ time: Date.now(), success });
    if (this.results.length > 100) this.results = this.results.slice(-100);
  }
  get successRate() {
    if (!this.results.length) return 1;
    return this.results.filter(r => r.success).length / this.results.length;
  }
  adjust() {
    if (this.successRate >= this.targetRate && this.current < this.max) {
      this.current = Math.min(this.current + 2, this.max);
    } else if (this.successRate < this.targetRate * 0.9) {
      this.current = Math.max(this.current - 5, this.min);
    }
  }
}
async function scrapeWithAdaptivePool(urls) {
  const sizer = new AdaptivePoolSizer();
  const agent = new HttpsProxyAgent('http://USERNAME:PASSWORD@gate.proxyhat.com:8080');
  for (let i = 0; i < urls.length; i += sizer.current) {
    const batch = urls.slice(i, i + sizer.current);
    const results = await Promise.allSettled(
      batch.map(url =>
        fetch(url, { agent, timeout: 30000 })
          .then(res => { sizer.record(res.ok); return res.text(); })
          .catch(() => { sizer.record(false); return null; })
      )
    );
    sizer.adjust();
    console.log(`Concurrent: ${sizer.current}, Success: ${(sizer.successRate * 100).toFixed(1)}%`);
  }
}

Common Mistakes When Sizing Proxy Pools

  • Using the same count for all targets. A pool that works for static blogs will fail on Amazon. Always benchmark per target.
  • Ignoring retry overhead. Failed requests consume bandwidth and time. Factor in a 20-40% retry rate for aggressive targets.
  • Not accounting for session requirements. If you need sticky sessions for login flows, each session ties up an IP. Calculate based on concurrent sessions, not just request rate.
  • Forgetting geographic needs. Ten IPs in the US will not help you scrape localized results in Japan. Plan per geography.
  • Over-provisioning "just in case." With rotating residential proxies like ProxyHat, you access a large pool automatically. You pay for bandwidth, not for the number of IPs in the pool. Focus on choosing the right proxy type rather than hoarding IPs.

ProxyHat Advantage: Pool Management Simplified

With ProxyHat's rotating residential proxy gateway, you do not need to manually manage a list of IPs. Every request through gate.proxyhat.com automatically receives a fresh IP from a pool of millions. This means:

  • No manual IP list management
  • Automatic rotation on every request (or sticky sessions when needed)
  • Access to IPs in 190+ countries
  • Pay for bandwidth used, not per-IP fees

Your "proxy count" effectively becomes your concurrency level — how many simultaneous connections you run through the gateway. Start with the formulas above, then let the adaptive sizing code fine-tune it in production.

For a complete walkthrough of scraping architecture with proxies, see our Complete Guide to Web Scraping Proxies. To learn about rotation strategies that complement your pool sizing, read How to Scrape Websites Without Getting Blocked.

Frequently Asked Questions

How many proxies do I need for small-scale scraping?

For small projects under 5,000 requests per day targeting moderately protected sites, 5-10 concurrent residential proxies are typically sufficient. With a rotating gateway like ProxyHat, you simply set your concurrency level to 5-10 and the system handles IP assignment.

Do I need more proxies for JavaScript-heavy sites?

Yes. Headless browser scraping is slower per request (2-10 seconds vs 0.5-1 second for HTML-only), which means each concurrent slot processes fewer requests. You may need 2-3x the concurrency to maintain the same throughput. See our guide on avoiding blocks for optimization tips.

Should I use residential or datacenter proxies?

For most scraping tasks, residential proxies offer higher success rates and require fewer concurrent connections. Datacenter proxies are cheaper per GB but get blocked faster, requiring a larger pool. Read our proxy type comparison for detailed guidance.

How does ProxyHat's rotating pool work?

Each request through ProxyHat's gateway (gate.proxyhat.com:8080) is automatically assigned a different residential IP. You do not manage individual IPs — you control concurrency and the system handles rotation. This is more efficient than maintaining a static IP list.

Bereit loszulegen?

Zugang zu über 50 Mio. Residential-IPs in über 148 Ländern mit KI-gesteuerter Filterung.

Preise ansehenResidential Proxies
← Zurück zum Blog