Stratégies de rotation de proxy pour le scraping à grande échelle

Maîtrisez les quatre stratégies clés de rotation de proxy : par requête, sessions sticky temporisées, basée sur les échecs et géo-distribuée. Exemples de code en Python, Node.js et Go.

Stratégies de rotation de proxy pour le scraping à grande échelle

Why Proxy Rotation Is Essential for Large-Scale Scraping

When you scale from hundreds to millions of requests, a single proxy IP becomes a liability. Websites track request patterns per IP and will throttle or ban addresses that exceed normal browsing behavior. Proxy rotation distributes your requests across many IPs so that no single address accumulates enough activity to trigger detection.

The difference between a naive rotation approach and a well-designed strategy can mean the difference between a 95% success rate and a 40% one. This guide covers the four main rotation strategies, when to use each, and how to implement them with working code examples.

This article is part of our Complete Guide to Web Scraping Proxies cluster. Start there if you need foundational proxy concepts.

Strategy 1: Per-Request Rotation

The simplest approach: every request gets a new IP. This is ideal for stateless data collection where each request is independent — price lookups, SERP queries, product page fetches.

When to Use

  • Scraping large catalogs where each URL is independent
  • SERP monitoring across many keywords
  • Any task that does not require cookies or session state

Python Implementation

import requests
PROXY = "http://USERNAME:PASSWORD@gate.proxyhat.com:8080"
def fetch_with_rotation(urls: list[str]) -> list[str]:
    """Each request automatically gets a fresh IP via the rotating gateway."""
    results = []
    session = requests.Session()
    session.proxies = {"http": PROXY, "https": PROXY}
    for url in urls:
        try:
            resp = session.get(url, timeout=30)
            resp.raise_for_status()
            results.append(resp.text)
        except requests.RequestException as e:
            print(f"Failed {url}: {e}")
            results.append(None)
    return results
# Each request through gate.proxyhat.com uses a different IP
pages = fetch_with_rotation([
    "https://example.com/product/1",
    "https://example.com/product/2",
    "https://example.com/product/3",
])

Node.js Implementation

const HttpsProxyAgent = require('https-proxy-agent');
const fetch = require('node-fetch');
const agent = new HttpsProxyAgent('http://USERNAME:PASSWORD@gate.proxyhat.com:8080');
async function fetchWithRotation(urls) {
  const results = [];
  for (const url of urls) {
    try {
      const res = await fetch(url, { agent, timeout: 30000 });
      results.push(await res.text());
    } catch (err) {
      console.error(`Failed ${url}: ${err.message}`);
      results.push(null);
    }
  }
  return results;
}

Go Implementation

package main
import (
    "fmt"
    "io"
    "net/http"
    "net/url"
    "time"
)
func fetchWithRotation(urls []string) []string {
    proxyURL, _ := url.Parse("http://USERNAME:PASSWORD@gate.proxyhat.com:8080")
    client := &http.Client{
        Transport: &http.Transport{Proxy: http.ProxyURL(proxyURL)},
        Timeout:   30 * time.Second,
    }
    results := make([]string, len(urls))
    for i, u := range urls {
        resp, err := client.Get(u)
        if err != nil {
            fmt.Printf("Failed %s: %v\n", u, err)
            continue
        }
        body, _ := io.ReadAll(resp.Body)
        resp.Body.Close()
        results[i] = string(body)
    }
    return results
}

Strategy 2: Timed Rotation (Sticky Sessions)

Some scraping tasks require the same IP for a series of related requests — browsing a paginated listing, navigating a multi-step checkout, or maintaining a logged-in session. Timed rotation (or sticky sessions) keeps the same IP assigned for a defined duration, typically 1-30 minutes.

When to Use

  • Paginated crawling (page 1, 2, 3... of results)
  • Tasks requiring cookies or session persistence
  • Simulating realistic browsing patterns

Implementation Pattern

With ProxyHat, sticky sessions are controlled via the session parameter in your credentials. Each unique session ID maintains the same IP for the configured duration:

import requests
import uuid
def create_sticky_session(duration_label: str = "10m"):
    """Create a session that maintains the same IP."""
    session_id = uuid.uuid4().hex[:8]
    proxy = f"http://USERNAME-session-{session_id}:PASSWORD@gate.proxyhat.com:8080"
    session = requests.Session()
    session.proxies = {"http": proxy, "https": proxy}
    return session
# All requests through this session use the same IP
session = create_sticky_session()
page1 = session.get("https://example.com/listings?page=1")
page2 = session.get("https://example.com/listings?page=2")
page3 = session.get("https://example.com/listings?page=3")

Node.js Sticky Session

const HttpsProxyAgent = require('https-proxy-agent');
const fetch = require('node-fetch');
const crypto = require('crypto');
function createStickyAgent() {
  const sessionId = crypto.randomBytes(4).toString('hex');
  return new HttpsProxyAgent(
    `http://USERNAME-session-${sessionId}:PASSWORD@gate.proxyhat.com:8080`
  );
}
async function crawlPaginated(baseUrl, pages) {
  const agent = createStickyAgent(); // Same IP for all pages
  const results = [];
  for (let page = 1; page <= pages; page++) {
    const res = await fetch(`${baseUrl}?page=${page}`, { agent });
    results.push(await res.text());
  }
  return results;
}

Strategy 3: Failure-Based Rotation

Instead of rotating on every request or on a timer, failure-based rotation keeps using an IP until it gets blocked, then switches. This maximizes the value of each IP by using it as long as it works.

When to Use

  • Targets with unpredictable blocking thresholds
  • Budget-conscious scraping where you want maximum requests per IP
  • Long-running crawls where some IPs last hours and others minutes

Implementation with Automatic Failover

import requests
import uuid
from time import sleep
class FailureBasedRotator:
    """Rotates proxy only when the current IP fails."""
    BLOCK_SIGNALS = [403, 429, 503]
    MAX_RETRIES = 3
    def __init__(self):
        self.session_id = None
        self.requests_on_current_ip = 0
        self._new_session()
    def _new_session(self):
        self.session_id = uuid.uuid4().hex[:8]
        self.requests_on_current_ip = 0
        proxy = f"http://USERNAME-session-{self.session_id}:PASSWORD@gate.proxyhat.com:8080"
        self.session = requests.Session()
        self.session.proxies = {"http": proxy, "https": proxy}
    def fetch(self, url: str) -> str | None:
        for attempt in range(self.MAX_RETRIES):
            try:
                resp = self.session.get(url, timeout=30)
                if resp.status_code in self.BLOCK_SIGNALS:
                    print(f"Blocked (HTTP {resp.status_code}) after "
                          f"{self.requests_on_current_ip} requests. Rotating...")
                    self._new_session()
                    sleep(1)
                    continue
                resp.raise_for_status()
                self.requests_on_current_ip += 1
                return resp.text
            except requests.RequestException:
                self._new_session()
                sleep(1)
        return None
# Usage
rotator = FailureBasedRotator()
for url in urls:
    html = rotator.fetch(url)

Go Implementation with Failover

package main
import (
    "crypto/rand"
    "encoding/hex"
    "fmt"
    "io"
    "net/http"
    "net/url"
    "time"
)
type FailureRotator struct {
    client    *http.Client
    sessionID string
    reqCount  int
}
func NewFailureRotator() *FailureRotator {
    r := &FailureRotator{}
    r.rotate()
    return r
}
func (r *FailureRotator) rotate() {
    b := make([]byte, 4)
    rand.Read(b)
    r.sessionID = hex.EncodeToString(b)
    r.reqCount = 0
    proxyStr := fmt.Sprintf("http://USERNAME-session-%s:PASSWORD@gate.proxyhat.com:8080", r.sessionID)
    proxyURL, _ := url.Parse(proxyStr)
    r.client = &http.Client{
        Transport: &http.Transport{Proxy: http.ProxyURL(proxyURL)},
        Timeout:   30 * time.Second,
    }
}
func (r *FailureRotator) Fetch(target string) (string, error) {
    for attempt := 0; attempt < 3; attempt++ {
        resp, err := r.client.Get(target)
        if err != nil {
            r.rotate()
            time.Sleep(time.Second)
            continue
        }
        defer resp.Body.Close()
        if resp.StatusCode == 403 || resp.StatusCode == 429 || resp.StatusCode == 503 {
            fmt.Printf("Blocked after %d requests. Rotating...\n", r.reqCount)
            r.rotate()
            time.Sleep(time.Second)
            continue
        }
        body, _ := io.ReadAll(resp.Body)
        r.reqCount++
        return string(body), nil
    }
    return "", fmt.Errorf("all retries exhausted for %s", target)
}

Strategy 4: Geo-Distributed Rotation

When scraping localized content — search results, pricing, availability — you need IPs from specific geographic locations. Geo-distributed rotation assigns IPs from target countries or cities to get accurate local data.

When to Use

  • SERP scraping for local search rankings
  • Price monitoring across regions
  • Content availability checks (geo-restricted content)
  • Ad verification in specific markets

Implementation with Country Targeting

import requests
from concurrent.futures import ThreadPoolExecutor
COUNTRIES = ["us", "gb", "de", "fr", "jp"]
def fetch_localized(url: str, country: str) -> dict:
    """Fetch URL through a proxy in the specified country."""
    proxy = f"http://USERNAME-country-{country}:PASSWORD@gate.proxyhat.com:8080"
    try:
        resp = requests.get(url, proxies={"http": proxy, "https": proxy}, timeout=30)
        return {"country": country, "status": resp.status_code, "body": resp.text}
    except requests.RequestException as e:
        return {"country": country, "status": 0, "error": str(e)}
def scrape_all_regions(url: str) -> list[dict]:
    """Fetch the same URL from multiple countries in parallel."""
    with ThreadPoolExecutor(max_workers=len(COUNTRIES)) as executor:
        futures = [executor.submit(fetch_localized, url, c) for c in COUNTRIES]
        return [f.result() for f in futures]
# Get localized pricing from 5 countries simultaneously
results = scrape_all_regions("https://example.com/product/pricing")
for r in results:
    print(f"{r['country'].upper()}: HTTP {r['status']}")

See available targeting options on the ProxyHat Locations page.

Combining Strategies: The Hybrid Approach

In practice, large-scale scraping projects combine multiple strategies. Here is a pattern that uses per-request rotation for discovery, sticky sessions for deep crawling, and failure-based fallback:

import requests
import uuid
from enum import Enum
class RotationMode(Enum):
    PER_REQUEST = "per_request"
    STICKY = "sticky"
    FAILURE_BASED = "failure_based"
class HybridRotator:
    def __init__(self, mode: RotationMode = RotationMode.PER_REQUEST):
        self.mode = mode
        self.session_id = None
        self.failure_count = 0
        self._init_session()
    def _init_session(self):
        if self.mode == RotationMode.PER_REQUEST:
            proxy = "http://USERNAME:PASSWORD@gate.proxyhat.com:8080"
        else:
            self.session_id = self.session_id or uuid.uuid4().hex[:8]
            proxy = f"http://USERNAME-session-{self.session_id}:PASSWORD@gate.proxyhat.com:8080"
        self.session = requests.Session()
        self.session.proxies = {"http": proxy, "https": proxy}
    def force_rotate(self):
        """Force a new IP regardless of mode."""
        self.session_id = uuid.uuid4().hex[:8]
        self.failure_count = 0
        self._init_session()
    def fetch(self, url: str) -> str | None:
        try:
            resp = self.session.get(url, timeout=30)
            if resp.status_code in [403, 429, 503]:
                self.failure_count += 1
                if self.failure_count >= 2:
                    self.force_rotate()
                return None
            self.failure_count = 0
            return resp.text
        except requests.RequestException:
            self.failure_count += 1
            if self.failure_count >= 2:
                self.force_rotate()
            return None
# Discovery phase: rotate every request
discovery = HybridRotator(RotationMode.PER_REQUEST)
sitemap_urls = [discovery.fetch(url) for url in seed_urls]
# Deep crawl phase: sticky sessions per site section
crawler = HybridRotator(RotationMode.STICKY)
for section_url in section_urls:
    pages = [crawler.fetch(f"{section_url}?page={i}") for i in range(1, 11)]
    crawler.force_rotate()  # New IP for next section

Rotation Strategy Comparison

StrategyBest ForSuccess RateIP EfficiencyComplexity
Per-RequestStateless bulk collectionHighLowLow
Timed/StickySession-dependent tasksMedium-HighMediumLow
Failure-BasedVariable-difficulty targetsMediumHighMedium
Geo-DistributedLocalized data collectionHighMediumMedium
HybridComplex multi-phase projectsHighestHighHigh

Best Practices for Rotation at Scale

  • Respect robots.txt. Rotation does not exempt you from being a good citizen. Check the rules and honor crawl-delay directives.
  • Add realistic delays. Even with rotation, bursting hundreds of requests per second looks robotic. Add 0.5-2 second random delays between requests.
  • Monitor success rates. Track HTTP status codes per target site. A drop below 90% means your rotation needs tuning.
  • Combine with header rotation. Rotating IPs alone is not enough. Rotate User-Agent strings and other headers to avoid fingerprint-based detection.
  • Use backoff on failures. When an IP gets blocked, wait before retrying. Exponential backoff (1s, 2s, 4s, 8s) prevents wasting requests on temporarily hostile targets.

To understand how many IPs you need to support your rotation strategy, see How Many Proxies Do You Need for Scraping?. For a comprehensive scraping architecture overview, visit our Complete Guide to Web Scraping Proxies.

Ready to implement these strategies? Check out the Python SDK, Node SDK, or Go SDK for production-ready proxy integration, or explore ProxyHat pricing plans to get started.

Frequently Asked Questions

What is the best proxy rotation strategy for web scraping?

Per-request rotation is the safest default for most scraping tasks. It ensures each request uses a different IP, making pattern detection much harder. For tasks requiring session persistence (pagination, login flows), use sticky sessions instead.

How fast should I rotate proxies?

For per-request rotation, every request gets a new IP automatically. For sticky sessions, 5-10 minutes is a good default. The optimal duration depends on the target — aggressive sites may require shorter sessions (1-2 minutes), while lenient ones tolerate 30+ minutes.

Can I combine different rotation strategies?

Yes, and you should for complex projects. Use per-request rotation for discovery and URL collection, sticky sessions for deep crawling, and failure-based rotation as a fallback when IPs get blocked. The hybrid approach in this guide shows how.

Does ProxyHat handle rotation automatically?

Yes. Every request through the ProxyHat gateway (gate.proxyhat.com:8080) automatically receives a different IP from the residential pool. For sticky sessions, add a session parameter to your credentials. No manual IP list management is needed.

Prêt à commencer ?

Accédez à plus de 50M d'IPs résidentielles dans plus de 148 pays avec filtrage IA.

Voir les tarifsProxies résidentiels
← Retour au Blog