Build a Google Rank Tracker in Python with Residential Proxies

A complete, code-first guide to building a production Google rank tracker in Python using curl_cffi for TLS impersonation and ProxyHat residential proxies with city-level geo-targeting, sticky sessions, and SQLite storage.

Build a Google Rank Tracker in Python with Residential Proxies

Why You Need to Build a Google Rank Tracker in Python with Residential Proxies

If you've ever tried to build a rank tracker in Python, you already know the pattern: your scraper works for a day, then Google starts returning CAPTCHAs, 429 errors, or empty result pages. The problem isn't your parsing logic — it's that Google's anti-bot stack has gotten significantly more sophisticated. TLS fingerprinting, IP reputation scoring, and behavioral analysis all work together to detect automated traffic. Residential proxies are the most reliable way to make your requests look like they come from real users in specific geographic locations.

This guide walks through a complete, production-ready implementation: data model, SERP fetching with pagination, proxy rotation with ProxyHat, position parsing, SQLite storage, and hardening for scale. Every code example is runnable. By the end, you'll have a tracker that fetches daily snapshots for hundreds of keywords across multiple countries without getting blocked.

The Data Model: What a Rank Tracker Actually Stores

A rank tracker is fundamentally a time-series database. Each row captures: what keyword was searched, which domain we're tracking, in what country, on what device, at what position, and when. Here's the minimal schema:

keyword        TEXT        -- e.g., "best running shoes"
target_domain  TEXT        -- e.g., "example.com"
country        TEXT        -- ISO 3166-1 alpha-2, e.g., "US"
device         TEXT        -- "desktop" or "mobile"
position       INTEGER     -- 1-100, or 0 if not found
captured_at    TIMESTAMP   -- when the SERP was fetched

Daily SERP snapshots beat one-off checks because rankings fluctuate. A domain might rank #3 on Monday and #8 on Wednesday due to algorithm updates, personalization, or indexing changes. Without historical data, you can't distinguish noise from trend. Store every snapshot — even when the position hasn't changed — so you can compute moving averages, detect volatility, and spot algorithm-update impacts retroactively. A 90-day history gives you enough data to see weekly patterns and seasonal shifts.

Fetching SERPs After Google Removed num=100

Google's num=100 parameter — which let you fetch 100 results in a single request — stopped working reliably in September 2025. Google now caps results at approximately 10 per page regardless of the num parameter. To capture the top 100 positions, you need to paginate using the start parameter: start=0, start=10, start=20, and so on up to start=90.

Each page is a separate HTTP request, which means 10 requests per keyword per day. At 100 keywords, that's 1,000 daily requests — well within the range where Google's rate limiting kicks in without proxies. Here's a basic curl example using ProxyHat's residential endpoint with city-level geo-targeting:

curl -x "http://user-country-US-city-chicago-session-keyword1:pass@gate.proxyhat.com:8080" \
  -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" \
  "https://www.google.com/search?q=best+running+shoes&gl=us&hl=en&num=10&start=0"

The -country-US-city-chicago flag tells ProxyHat to route through a residential IP in Chicago, matching the gl=us parameter. The -session-keyword1 flag keeps the same IP across all 10 pages for that keyword, so Google sees a consistent user session rather than 10 different IPs hitting the same query.

Why Residential Proxies Are Non-Negotiable for SERP Scraping

Google employs multiple layers of bot detection. Two of the most effective are:

  1. TLS/JA3-JA4 fingerprinting: Google's servers inspect the TLS ClientHello packet to identify the client's TLS library. A Python requests session has a completely different fingerprint than Chrome, Firefox, or Safari. This fingerprint is nearly impossible to spoof with standard HTTP libraries. See TLS on Wikipedia for background on the handshake and fingerprinting.
  2. IP reputation scoring: Google maintains reputation scores for IP ranges. Datacenter IPs from AWS, DigitalOcean, or Hetzner are flagged as high-risk because they're associated with automated traffic. Residential IPs from ISPs like Comcast, AT&T, or Vodafone score as low-risk because they belong to real households.

The table below compares proxy types for google rank tracker proxies:

FeatureDatacenterResidentialMobile
IP ReputationLow (flagged)High (ISP-assigned)Highest (carrier-assigned)
Success Rate (Google)~30-50%~90-95%~95-98%
Latency~50ms~200-500ms~300-800ms
Cost per GB$0.50-$1$2-$8$8-$20
Geo-TargetingCountry onlyCountry + CityCountry + City + Carrier

For rank tracking, residential proxies with city-level geo-targeting offer the best balance of success rate, cost, and geographic precision. Mobile proxies have slightly higher success rates but cost 3-4x more, which matters when you're fetching 1,000+ pages daily. Check ProxyHat's available locations to confirm city-level coverage in your target markets.

Worked Example: curl_cffi + ProxyHat SDK

The curl_cffi library wraps libcurl with a Python interface and supports TLS impersonation — it can mimic Chrome's exact JA3/JA4 fingerprint. Combined with ProxyHat's residential proxies, this is the most reliable approach for serp scraping in Python in 2026. The ProxyHat documentation covers the full SDK API, but here's everything you need to get started.

Step 1: Install Dependencies

pip install curl_cffi beautifulsoup4 lxml

Step 2: Fetch a SERP with TLS Impersonation and ProxyHat

from curl_cffi import requests as cffi_requests

PROXY = "http://user-country-US-city-chicago-session-rank01:pass@gate.proxyhat.com:8080"

def fetch_serp(keyword: str, start: int = 0, country: str = "us") -> str:
    """Fetch a Google SERP page with TLS impersonation and residential proxy."""
    url = "https://www.google.com/search"
    params = {
        "q": keyword,
        "gl": country,
        "hl": "en",
        "num": 10,
        "start": start,
    }
    headers = {
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
        "Accept-Language": "en-US,en;q=0.9",
        "Referer": "https://www.google.com/",
    }
    try:
        resp = cffi_requests.get(
            url,
            params=params,
            headers=headers,
            proxies={"http": PROXY, "https": PROXY},
            impersonate="chrome",
            timeout=30,
        )
        resp.raise_for_status()
        return resp.text
    except Exception as e:
        print(f"[ERROR] Failed to fetch SERP for '{keyword}' start={start}: {e}")
        return ""

html = fetch_serp("best running shoes", start=0)
print(f"Fetched {len(html)} bytes")

Step 3: Parse Organic Positions with CSS Selectors

from bs4 import BeautifulSoup
import re

def parse_organic_results(html: str) -> list[dict]:
    """Parse organic results from Google SERP HTML. Skips ads and SERP features."""
    soup = BeautifulSoup(html, "lxml")
    results = []

    # Google wraps organic results in divs with class containing 'g'
    for div in soup.select("div.g"):
        link = div.select_one("a[href]")
        if not link:
            continue
        href = link.get("href", "")
        # Only count actual search result links (not Google internal links)
        if not href.startswith("/url?") and not href.startswith("http"):
            continue
        # Extract the actual URL from /url?q=... redirects
        if href.startswith("/url?"):
            match = re.search(r"[?&]q=([^&]+)", href)
            if match:
                href = match.group(1)
        title_elem = div.select_one("h3")
        title = title_elem.get_text(strip=True) if title_elem else ""
        results.append({"url": href, "title": title})

    return results

def find_position(results: list[dict], target_domain: str) -> int:
    """Find the 1-based position of target_domain in results. Returns 0 if not found."""
    for i, r in enumerate(results, 1):
        if target_domain in r["url"]:
            return i
    return 0

results = parse_organic_results(html)
pos = find_position(results, "nike.com")
print(f"Position: {pos}")  # e.g., Position: 4

Step 4: Store Rank History in SQLite

import sqlite3
from datetime import datetime, timezone

def init_db(db_path: str = "rank_tracker.db") -> sqlite3.Connection:
    conn = sqlite3.connect(db_path)
    conn.execute("""
        CREATE TABLE IF NOT EXISTS rank_snapshots (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            keyword TEXT NOT NULL,
            target_domain TEXT NOT NULL,
            country TEXT NOT NULL,
            device TEXT NOT NULL,
            position INTEGER NOT NULL,
            captured_at TIMESTAMP NOT NULL
        )
    """)
    conn.execute("""
        CREATE INDEX IF NOT EXISTS idx_keyword_domain_date
        ON rank_snapshots(keyword, target_domain, captured_at)
    """)
    conn.commit()
    return conn

def save_snapshot(conn: sqlite3.Connection, keyword: str, domain: str,
                  country: str, device: str, position: int):
    conn.execute(
        "INSERT INTO rank_snapshots (keyword, target_domain, country, device, position, captured_at) "
        "VALUES (?, ?, ?, ?, ?, ?)",
        (keyword, domain, country, device, position, datetime.now(timezone.utc))
    )
    conn.commit()

# Usage
conn = init_db()
save_snapshot(conn, "best running shoes", "nike.com", "US", "desktop", pos)

Step 5: Full Keyword Tracking Loop with Pagination

import time

def track_keyword(keyword: str, target_domain: str, country: str = "US",
                   device: str = "desktop", max_results: int = 100) -> int:
    """Track a keyword across the top 100 results. Returns best position found."""
    best_position = 0
    conn = init_db()
    for start in range(0, max_results, 10):
        html = fetch_serp(keyword, start=start, country=country.lower())
        if not html:
            break
        results = parse_organic_results(html)
        if not results:
            break
        pos = find_position(results, target_domain)
        if pos > 0:
            actual_pos = start + pos
            if best_position == 0 or actual_pos < best_position:
                best_position = actual_pos
        time.sleep(2)  # Be polite between pages

    save_snapshot(conn, keyword, target_domain, country, device, best_position)
    return best_position

pos = track_keyword("best running shoes", "nike.com", "US")
print(f"Final position: {pos}")

Production Hardening: Retries, CAPTCHA Detection, and Concurrency

A script that works for 10 keywords will fail at 1,000. Here's how to harden it for production scale.

Exponential Backoff with Retries and CAPTCHA Detection

import random

def fetch_with_retry(keyword: str, start: int = 0, max_retries: int = 3) -> str:
    """Fetch SERP with exponential backoff, jitter, and CAPTCHA detection."""
    for attempt in range(max_retries):
        html = fetch_serp(keyword, start=start)
        if html and len(html) > 5000 and "captcha" not in html.lower():
            return html
        # Detect CAPTCHA or unusual traffic pages
        if html and ("unusual traffic" in html.lower() or "captcha" in html.lower()):
            print(f"[CAPTCHA] Detected on attempt {attempt+1} for '{keyword}'")
        backoff = (2 ** attempt) + random.uniform(0, 1)
        time.sleep(backoff)
    return ""

Async Concurrency with Per-Country Proxy Pools

import asyncio
from curl_cffi.requests import AsyncSession

COUNTRY_CONFIG = {
    "US": {"city": "chicago", "max_concurrent": 5},
    "DE": {"city": "berlin", "max_concurrent": 3},
    "GB": {"city": "london", "max_concurrent": 3},
}

async def fetch_serp_async(session: AsyncSession, keyword: str,
                           start: int, country: str) -> str:
    cfg = COUNTRY_CONFIG.get(country, {"city": "chicago", "max_concurrent": 3})
    proxy = (
        f"http://user-country-{country}-city-{cfg['city']}"
        f"-session-{keyword.replace(' ', '-')}:pass@gate.proxyhat.com:8080"
    )
    url = "https://www.google.com/search"
    params = {"q": keyword, "gl": country.lower(), "hl": "en", "num": 10, "start": start}
    try:
        resp = await session.get(
            url, params=params, proxies={"http": proxy, "https": proxy},
            impersonate="chrome", timeout=30
        )
        return resp.text
    except Exception as e:
        print(f"[ERROR] {keyword} start={start}: {e}")
        return ""

async def track_keywords_async(keywords: list[str], target_domain: str, country: str = "US"):
    """Track multiple keywords concurrently with per-country rate limiting."""
    cfg = COUNTRY_CONFIG.get(country, {"max_concurrent": 3})
    semaphore = asyncio.Semaphore(cfg["max_concurrent"])

    async def track_one(keyword: str):
        async with semaphore:
            best = 0
            async with AsyncSession() as session:
                for start in range(0, 100, 10):
                    html = await fetch_serp_async(session, keyword, start, country)
                    if not html:
                        break
                    results = parse_organic_results(html)
                    pos = find_position(results, target_domain)
                    if pos > 0:
                        actual = start + pos
                        if best == 0 or actual < best:
                            best = actual
                    await asyncio.sleep(1)
            return keyword, best

    tasks = [track_one(kw) for kw in keywords]
    results = await asyncio.gather(*tasks)
    for kw, pos in results:
        print(f"{kw}: position {pos}")

asyncio.run(track_keywords_async(
    ["best running shoes", "marathon shoes", "trail running shoes"],
    "nike.com", "US"
))

Rank Volatility Smoothing and CSV Export

Single-day position jumps are often noise. Apply a 7-day moving average to smooth volatility and surface real trends. Export to CSV for reporting:

import csv

def moving_average(conn: sqlite3.Connection, keyword: str,
                    domain: str, window: int = 7) -> list[tuple]:
    """Compute a rolling average of position over the last N snapshots."""
    rows = conn.execute(
        "SELECT captured_at, position FROM rank_snapshots "
        "WHERE keyword=? AND target_domain=? ORDER BY captured_at",
        (keyword, domain)
    ).fetchall()
    smoothed = []
    for i in range(len(rows)):
        start_idx = max(0, i - window + 1)
        window_rows = rows[start_idx:i+1]
        avg = sum(r[1] for r in window_rows) / len(window_rows)
        smoothed.append((rows[i][0], round(avg, 1)))
    return smoothed

def export_csv(conn: sqlite3.Connection, output_path: str = "rank_history.csv"):
    """Export all rank snapshots to a CSV file."""
    rows = conn.execute(
        "SELECT keyword, target_domain, country, device, position, captured_at "
        "FROM rank_snapshots ORDER BY captured_at"
    ).fetchall()
    with open(output_path, "w", newline="") as f:
        writer = csv.writer(f)
        writer.writerow(["keyword", "target_domain", "country", "device", "position", "captured_at"])
        writer.writerows(rows)
    print(f"Exported {len(rows)} rows to {output_path}")

Ethics, Rate Limits, and Legal Considerations

Rank tracking sits in a gray area. Here are the practical guardrails:

  • Track your own domains first. If you own the site, you can also use Google Search Console (free, official, no scraping needed) for impression and position data. See Google Search documentation for details on what data is available.
  • Respect rate limits. Keep concurrency to 3-5 requests per country. Add 1-2 second delays between page fetches. A sudden burst of 50 requests from one IP will trigger CAPTCHAs within minutes.
  • Use the right proxy geography. If you're tracking US rankings, route through US residential IPs. Mismatched geo locations produce skewed results. Check ProxyHat's available locations to confirm coverage.
  • Review Google's Terms of Service. Automated querying of Google Search is technically against Google's ToS. Most rank trackers operate anyway, but understand the risk — especially at high volume.
  • Consider official SERP APIs at low volume. If you only track 10-20 keywords, a third-party SERP API may be more cost-effective and fully compliant. At 100+ keywords with daily snapshots, self-hosted scraping with residential proxies becomes significantly cheaper.

For ProxyHat-specific setup, see the ProxyHat documentation and pricing plans. If you're also doing broader web scraping, check out our web scraping use case guide. For SERP tracking specifically, our SERP tracking page covers proxy configuration in more detail.

Key Takeaways

  • Daily snapshots beat one-off checks. Store every rank position with a timestamp so you can detect trends and algorithm-update impacts over 90+ days.
  • Paginate with start=0,10,20... Google capped results at ~10 per page. Fetch 10 pages to cover the top 100 positions per keyword.
  • Use curl_cffi with impersonate='chrome'. Standard Python HTTP libraries leak their TLS fingerprint. curl_cffi mimics Chrome's exact JA3/JA4 signature, which is critical for serp scraping in Python in 2026.
  • Residential proxies with city-level geo-targeting are essential. Datacenter IPs get blocked within hours. Residential IPs from ProxyHat route through real ISP assignments with 90-95% success rates.
  • Sticky sessions per keyword. Use -session-{keyword} in your ProxyHat username to maintain the same IP across all 10 pagination requests for a keyword.
  • Hardening matters at scale. Add exponential backoff with jitter, CAPTCHA detection, per-country concurrency limits (3-5), and 7-day moving averages for volatility smoothing.

Ready to get started?

Access 50M+ residential IPs across 148+ countries with AI-powered filtering.

View PricingResidential Proxies
← Back to Blog