Wie man Google-Suchergebnisse mit Proxies scrapt

Erfahren Sie, wie Sie Google SERPs mit Residential Proxies scrapen. Vollständige Code-Beispiele in Python, Node.js und Go zur Extraktion organischer Ergebnisse, Featured Snippets und People-Also-Ask-Daten.

Wie man Google-Suchergebnisse mit Proxies scrapt

Why Scrape Google Search Results?

Google processes over 8.5 billion searches per day, making its search engine results pages (SERPs) the most valuable source of competitive intelligence on the web. Scraping Google search results gives you access to organic rankings, featured snippets, People Also Ask boxes, local packs, and paid ad placements — all in real time.

Whether you are building a SERP monitoring pipeline or performing one-off keyword research, programmatic access to Google results lets you automate workflows that would take hours to complete manually. Common use cases include:

  • Tracking your own keyword rankings across markets
  • Monitoring competitor visibility for target queries
  • Analyzing SERP feature distribution (snippets, images, videos)
  • Building datasets for SEO research and content strategy

Understanding Google SERP Structure

Before writing a scraper, you need to understand the anatomy of a Google results page. A modern SERP can contain over a dozen distinct result types:

Result TypeCSS / Data MarkerDescription
Organic resultsdiv#search .gStandard blue-link results with title, URL, and snippet
Featured snippetdiv.xpdopenAnswer box displayed above organic results
People Also Askdiv.related-question-pairExpandable FAQ-style questions
Local packdiv.VkpGBbMap with 3 local business listings
Knowledge paneldiv.kp-wholepageEntity information sidebar
Ad resultsdiv.uEierdPaid search ads at top and bottom
Google changes class names frequently. Build your parser with fallback selectors and test regularly to keep extraction reliable.

Setting Up Your Scraping Environment

To scrape Google reliably, you need three components: an HTTP client, a proxy connection, and an HTML parser. Below are complete examples in Python, Node.js, and Go using ProxyHat proxies.

Python Example

Install the dependencies first. The ProxyHat Python SDK simplifies proxy configuration.

pip install requests beautifulsoup4
import requests
from bs4 import BeautifulSoup
proxy_url = "http://USERNAME:PASSWORD@gate.proxyhat.com:8080"
proxies = {
    "http": proxy_url,
    "https": proxy_url,
}
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
    "Accept-Language": "en-US,en;q=0.9",
}
def scrape_google(query, num_results=10):
    params = {
        "q": query,
        "num": num_results,
        "hl": "en",
        "gl": "us",
    }
    response = requests.get(
        "https://www.google.com/search",
        params=params,
        headers=headers,
        proxies=proxies,
        timeout=15,
    )
    response.raise_for_status()
    soup = BeautifulSoup(response.text, "html.parser")
    results = []
    for g in soup.select("div#search .g"):
        title_el = g.select_one("h3")
        link_el = g.select_one("a")
        snippet_el = g.select_one(".VwiC3b")
        if title_el and link_el:
            results.append({
                "title": title_el.get_text(),
                "url": link_el["href"],
                "snippet": snippet_el.get_text() if snippet_el else "",
            })
    return results
results = scrape_google("best residential proxies 2026")
for i, r in enumerate(results, 1):
    print(f"{i}. {r['title']}\n   {r['url']}\n")

Node.js Example

Using the ProxyHat Node SDK and Cheerio for parsing:

npm install axios cheerio https-proxy-agent
const axios = require('axios');
const cheerio = require('cheerio');
const { HttpsProxyAgent } = require('https-proxy-agent');
const agent = new HttpsProxyAgent('http://USERNAME:PASSWORD@gate.proxyhat.com:8080');
async function scrapeGoogle(query) {
  const { data } = await axios.get('https://www.google.com/search', {
    params: { q: query, num: 10, hl: 'en', gl: 'us' },
    headers: {
      'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
      'Accept-Language': 'en-US,en;q=0.9',
    },
    httpsAgent: agent,
    timeout: 15000,
  });
  const $ = cheerio.load(data);
  const results = [];
  $('div#search .g').each((i, el) => {
    const title = $(el).find('h3').text();
    const url = $(el).find('a').attr('href');
    const snippet = $(el).find('.VwiC3b').text();
    if (title && url) {
      results.push({ position: i + 1, title, url, snippet });
    }
  });
  return results;
}
scrapeGoogle('best residential proxies 2026').then(console.log);

Go Example

Using the ProxyHat Go SDK and goquery:

package main
import (
    "fmt"
    "log"
    "net/http"
    "net/url"
    "github.com/PuerkitoBio/goquery"
)
func main() {
    proxyURL, _ := url.Parse("http://USERNAME:PASSWORD@gate.proxyhat.com:8080")
    client := &http.Client{
        Transport: &http.Transport{Proxy: http.ProxyURL(proxyURL)},
    }
    req, _ := http.NewRequest("GET", "https://www.google.com/search?q=best+residential+proxies&num=10&hl=en&gl=us", nil)
    req.Header.Set("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36")
    req.Header.Set("Accept-Language", "en-US,en;q=0.9")
    resp, err := client.Do(req)
    if err != nil {
        log.Fatal(err)
    }
    defer resp.Body.Close()
    doc, _ := goquery.NewDocumentFromReader(resp.Body)
    doc.Find("div#search .g").Each(func(i int, s *goquery.Selection) {
        title := s.Find("h3").Text()
        link, _ := s.Find("a").Attr("href")
        fmt.Printf("%d. %s\n   %s\n\n", i+1, title, link)
    })
}

Parsing Different SERP Features

A complete scraper should handle more than just organic results. Here are parsing patterns for the most valuable SERP features.

Featured Snippets

# Python: Extract featured snippet
snippet_box = soup.select_one("div.xpdopen")
if snippet_box:
    featured = {
        "type": "featured_snippet",
        "text": snippet_box.get_text(strip=True),
        "source_url": snippet_box.select_one("a")["href"] if snippet_box.select_one("a") else None,
    }

People Also Ask

# Python: Extract PAA questions
paa_questions = []
for q in soup.select("div.related-question-pair"):
    question_text = q.select_one("span")
    if question_text:
        paa_questions.append(question_text.get_text(strip=True))

Local Pack Results

# Python: Extract local pack
local_results = []
for item in soup.select("div.VkpGBb"):
    name = item.select_one(".dbg0pd")
    rating = item.select_one(".yi40Hd")
    local_results.append({
        "name": name.get_text() if name else "",
        "rating": rating.get_text() if rating else "",
    })

Handling Google Blocks and CAPTCHAs

Google actively defends against automated scraping. Without proper proxy infrastructure, you will encounter blocks within dozens of requests. The key defensive mechanisms include:

  • Rate limiting: Too many requests from one IP triggers a 429 status code
  • CAPTCHA challenges: Google serves reCAPTCHA when it suspects automation
  • IP reputation: Datacenter IP ranges receive more scrutiny than residential IPs
  • Browser fingerprinting: Missing or inconsistent headers raise flags

For detailed anti-detection strategies, see our guide on scraping without getting blocked and how anti-bot systems detect proxies.

Recommended Proxy Strategy

Residential proxies are essential for sustained Google scraping. ProxyHat residential proxies provide access to millions of IPs across 190+ locations, enabling you to rotate IPs automatically and geo-target your requests. Key configuration tips:

  • Rotate IPs on every request — never reuse the same IP for consecutive Google queries
  • Add random delays between 2-5 seconds between requests
  • Match your User-Agent to a real browser version
  • Set hl and gl parameters consistent with your proxy location

Refer to the ProxyHat documentation for authentication setup and session management.

Building a Production Scraper

Moving from a script to a production pipeline requires retry logic, structured output, and monitoring. Here is a hardened version of the Python scraper:

import requests
import time
import random
import json
from bs4 import BeautifulSoup
from datetime import datetime
PROXY_URL = "http://USERNAME:PASSWORD@gate.proxyhat.com:8080"
USER_AGENTS = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
]
def scrape_serp(query, location="us", retries=3):
    for attempt in range(retries):
        try:
            headers = {
                "User-Agent": random.choice(USER_AGENTS),
                "Accept-Language": "en-US,en;q=0.9",
                "Accept": "text/html,application/xhtml+xml",
            }
            response = requests.get(
                "https://www.google.com/search",
                params={"q": query, "num": 10, "hl": "en", "gl": location},
                proxies={"http": PROXY_URL, "https": PROXY_URL},
                headers=headers,
                timeout=15,
            )
            if response.status_code == 429:
                wait = (attempt + 1) * 10
                print(f"Rate limited. Waiting {wait}s...")
                time.sleep(wait)
                continue
            response.raise_for_status()
            soup = BeautifulSoup(response.text, "html.parser")
            # Check for CAPTCHA
            if "captcha" in response.text.lower() or soup.select_one("#captcha-form"):
                print(f"CAPTCHA detected. Retrying with new IP...")
                time.sleep(random.uniform(5, 10))
                continue
            return parse_serp(soup, query)
        except requests.RequestException as e:
            print(f"Attempt {attempt + 1} failed: {e}")
            time.sleep(random.uniform(2, 5))
    return None
def parse_serp(soup, query):
    results = {
        "query": query,
        "timestamp": datetime.utcnow().isoformat(),
        "organic": [],
        "featured_snippet": None,
        "paa": [],
    }
    # Organic results
    for i, g in enumerate(soup.select("div#search .g")):
        title_el = g.select_one("h3")
        link_el = g.select_one("a")
        snippet_el = g.select_one(".VwiC3b")
        if title_el and link_el:
            results["organic"].append({
                "position": i + 1,
                "title": title_el.get_text(),
                "url": link_el["href"],
                "snippet": snippet_el.get_text() if snippet_el else "",
            })
    # Featured snippet
    snippet_box = soup.select_one("div.xpdopen")
    if snippet_box:
        results["featured_snippet"] = snippet_box.get_text(strip=True)[:500]
    # People Also Ask
    for q in soup.select("div.related-question-pair span"):
        results["paa"].append(q.get_text(strip=True))
    return results
# Usage: scrape a list of keywords
keywords = ["best residential proxies", "proxy for web scraping", "serp tracking tools"]
all_results = []
for kw in keywords:
    result = scrape_serp(kw)
    if result:
        all_results.append(result)
    time.sleep(random.uniform(3, 7))  # Delay between keywords
# Save to JSON
with open("serp_results.json", "w") as f:
    json.dump(all_results, f, indent=2)

Scaling Your SERP Scraper

When monitoring hundreds or thousands of keywords, single-threaded scraping is too slow. Consider these scaling approaches:

  • Concurrent requests: Use asyncio (Python), worker threads (Node.js), or goroutines (Go) to send multiple requests in parallel
  • Queue-based architecture: Push keywords into a queue (Redis, RabbitMQ) and process them with multiple workers
  • Proxy pool management: ProxyHat handles rotation automatically, but configure session stickiness based on your needs
  • Result caching: Cache SERP data to avoid redundant requests for the same query within a time window

For comprehensive guidance on building scalable scraping systems, read our complete guide to web scraping proxies.

Legal and Ethical Considerations

Google's Terms of Service restrict automated access. When scraping Google SERPs, follow these guidelines:

  • Respect rate limits and avoid overwhelming Google's servers
  • Use the data for legitimate business purposes (SEO monitoring, market research)
  • Do not redistribute raw SERP data commercially without understanding applicable laws
  • Consider using Google's official APIs where they meet your needs
Always check your local laws regarding web scraping and data collection before deploying a SERP scraper at scale.

Bereit loszulegen?

Zugang zu über 50 Mio. Residential-IPs in über 148 Ländern mit KI-gesteuerter Filterung.

Preise ansehenResidential Proxies
← Zurück zum Blog