在Python中使用代理(Requests + ProxyHat SDK)

与请求库和代理Hat SDK一起学习如何在Python中使用代理. 包括认证、旋转、地理目标、错误处理和合成刮片.

在Python中使用代理(Requests + ProxyHat SDK)

为什么在Python中使用代理?

Python主导数据提取景观. 图书馆 请求, (中文). httpx 数据,以及 摇摆 使HTTP调用微不足道,但没有代理,您的脚本在几分钟内击中IP禁用. 使用 Python 中的代理 让您旋转 IP 地址, 绕过地理限制, 并可靠地缩放您的刮切操作 。

在此指南中,您将学习如何使用 代理汉字 Python SDK 和标准 requests 图书馆。 每个区段都包含可立即运行的复制版准备代码.

你是否正在建立一个 网络刮削管道,监测 应急方案成果,或收集定价数据,本指南涵盖认证,代理旋转,地理目标设定,错误处理,以及生产规模.

安装和设置

安装代理Hat SDK和请求

安装代理Hat Python SDK 和 requests 使用 pip 的库 :

pip install proxyhat requests

对于自动工作流程,也安装 httpxaiohttp编号 :

pip install httpx aiohttp

获取您的 API 证书

注册到 代理汉特 从仪表板上取回你的API密钥. 你会需要你的 用户名密码 (或API密钥)用于代理认证. 可在 代理Hat API 文档。 。 。

认证和基本配置

使用代理Hat SDK

SDK为您处理认证、旋转和连接管理:

from proxyhat import ProxyHat
client = ProxyHat(
    api_key="your_api_key_here"
)
# Test the connection
info = client.get_account_info()
print(f"Traffic remaining: {info['traffic_remaining']} GB")

使用带请求的 Raw 代理证书

如果你喜欢使用 requests 直接配置代理 URL :

import requests
proxy_url = "http://username:password@gate.proxyhat.com:8080"
proxies = {
    "http": proxy_url,
    "https": proxy_url,
}
response = requests.get(
    "https://httpbin.org/ip",
    proxies=proxies,
    timeout=30
)
print(response.json())
# {"origin": "185.xxx.xxx.xxx"}

简单 使用代理服务器请求

以下是通过代理Hat住宅代理发送GET请求的完整实例:

from proxyhat import ProxyHat
client = ProxyHat(api_key="your_api_key_here")
# Make a proxied GET request
response = client.get("https://httpbin.org/ip")
print(f"Status: {response.status_code}")
print(f"IP: {response.json()['origin']}")
print(f"Headers: {response.headers}")

或者用标准 requests 库 :

import requests
proxies = {
    "http": "http://user:pass@gate.proxyhat.com:8080",
    "https": "http://user:pass@gate.proxyhat.com:8080",
}
response = requests.get(
    "https://example.com/api/data",
    proxies=proxies,
    timeout=30,
    headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"}
)
print(response.status_code)
print(response.text[:500])

选择正确的代理类型

代理Hat提供三种代理类型. 根据您的使用大小写选择 。 更深入的比较,请读我们的指南 住宅对数据中心对移动代理。 。 。

类型最佳时间速度检测风险费用
住所网络刮刮,SERP跟踪中型极低按GB计
数据中心量大、速度关键的任务快点高级每个实施伙伴/月
移动社交媒体,应用测试中型最低按GB计

在代码中切换代理类型

from proxyhat import ProxyHat
client = ProxyHat(api_key="your_api_key_here")
# Residential proxy (default)
response = client.get(
    "https://example.com",
    proxy_type="residential"
)
# Datacenter proxy
response = client.get(
    "https://example.com",
    proxy_type="datacenter"
)
# Mobile proxy
response = client.get(
    "https://example.com",
    proxy_type="mobile"
)

旋转对粘性会话

旋转代理 为每个请求指定一个新的IP,在您需要最大匿名的情况下进行大规模刮刮的理想。 粘附会议 在设定的时间内保持相同的IP,对于登录序列或帕格化导航等多步骤工作流程至关重要.

旋转代理( 每个请求新IP)

from proxyhat import ProxyHat
client = ProxyHat(api_key="your_api_key_here")
urls = [
    "https://httpbin.org/ip",
    "https://httpbin.org/ip",
    "https://httpbin.org/ip",
]
for url in urls:
    response = client.get(url, session_type="rotating")
    print(f"IP: {response.json()['origin']}")
# Each request uses a different IP:
# IP: 185.xxx.xxx.1
# IP: 92.xxx.xxx.47
# IP: 78.xxx.xxx.203

粘贴会话( 持续时间相同 IP)

from proxyhat import ProxyHat
client = ProxyHat(api_key="your_api_key_here")
# Create a sticky session (maintains IP for up to 30 minutes)
session = client.create_session(duration_minutes=30)
# All requests in this session use the same IP
for page in range(1, 6):
    response = session.get(f"https://example.com/products?page={page}")
    print(f"Page {page}: IP {response.headers.get('X-Proxy-IP')}")
# Same IP across all pages:
# Page 1: IP 185.xxx.xxx.42
# Page 2: IP 185.xxx.xxx.42
# Page 3: IP 185.xxx.xxx.42

地理目标请求

需要特定国家的数据吗? 代理汉特支持 195+地点的地理目标这对于本地化的SERP刮刮、价格监测和内容核查至关重要。

from proxyhat import ProxyHat
client = ProxyHat(api_key="your_api_key_here")
# Target a specific country
response = client.get(
    "https://www.google.com/search?q=best+restaurants",
    country="US"
)
# Target a specific city
response = client.get(
    "https://www.google.com/search?q=best+restaurants",
    country="US",
    city="New York"
)
# Using raw proxy URL with geo-targeting
# Format: username-country-US:password@gate.proxyhat.com:8080
import requests
proxies = {
    "http": "http://user-country-DE:pass@gate.proxyhat.com:8080",
    "https": "http://user-country-DE:pass@gate.proxyhat.com:8080",
}
response = requests.get("https://www.google.de", proxies=proxies, timeout=30)
print(f"Accessed from Germany: {response.status_code}")

处理和检索错误

网络请求失败 。 临近超时. 目标挡住你 强力错误处理将生产刮刮机与玩具脚本分开.

基本重试逻辑

import time
import requests
from requests.exceptions import ProxyError, Timeout, ConnectionError
def fetch_with_retry(url, proxies, max_retries=3, timeout=30):
    """Fetch a URL with automatic retry on failure."""
    for attempt in range(max_retries):
        try:
            response = requests.get(
                url,
                proxies=proxies,
                timeout=timeout,
                headers={"User-Agent": "Mozilla/5.0"}
            )
            response.raise_for_status()
            return response
        except (ProxyError, Timeout, ConnectionError) as e:
            wait = 2 ** attempt  # Exponential backoff
            print(f"Attempt {attempt + 1} failed: {e}. Retrying in {wait}s...")
            time.sleep(wait)
        except requests.exceptions.HTTPError as e:
            if e.response.status_code == 429:
                wait = 10 * (attempt + 1)
                print(f"Rate limited. Waiting {wait}s...")
                time.sleep(wait)
            elif e.response.status_code >= 500:
                time.sleep(2 ** attempt)
            else:
                raise
    raise Exception(f"Failed to fetch {url} after {max_retries} attempts")
# Usage
proxies = {
    "http": "http://user:pass@gate.proxyhat.com:8080",
    "https": "http://user:pass@gate.proxyhat.com:8080",
}
response = fetch_with_retry("https://example.com/data", proxies)

使用 SDK 的内置重试

from proxyhat import ProxyHat
client = ProxyHat(
    api_key="your_api_key_here",
    max_retries=3,
    timeout=30,
    retry_on_status=[429, 500, 502, 503]
)
# The SDK handles retries automatically
response = client.get("https://example.com/data")
print(response.status_code)

与线索同时擦拭

顺序请求缓慢. 对于生产工作量,使用Python的 concurrent.futures 通过代理并行请求。

from concurrent.futures import ThreadPoolExecutor, as_completed
from proxyhat import ProxyHat
client = ProxyHat(api_key="your_api_key_here")
urls = [
    "https://example.com/product/1",
    "https://example.com/product/2",
    "https://example.com/product/3",
    "https://example.com/product/4",
    "https://example.com/product/5",
]
def scrape(url):
    """Scrape a single URL through the proxy."""
    response = client.get(url, proxy_type="residential")
    return {"url": url, "status": response.status_code, "length": len(response.text)}
# Run 5 concurrent requests
results = []
with ThreadPoolExecutor(max_workers=5) as executor:
    futures = {executor.submit(scrape, url): url for url in urls}
    for future in as_completed(futures):
        try:
            result = future.result()
            results.append(result)
            print(f"OK: {result['url']} ({result['length']} bytes)")
        except Exception as e:
            print(f"Error: {futures[future]} - {e}")
print(f"\nCompleted: {len(results)}/{len(urls)}")

与 Ayncio 和 httpx 一起拼写

import asyncio
import httpx
async def scrape_urls(urls, proxy_url, max_concurrent=10):
    """Scrape multiple URLs concurrently using async proxies."""
    semaphore = asyncio.Semaphore(max_concurrent)
    async def fetch(client, url):
        async with semaphore:
            response = await client.get(url, timeout=30)
            return {"url": url, "status": response.status_code}
    async with httpx.AsyncClient(proxy=proxy_url) as client:
        tasks = [fetch(client, url) for url in urls]
        return await asyncio.gather(*tasks, return_exceptions=True)
# Usage
proxy_url = "http://user:pass@gate.proxyhat.com:8080"
urls = [f"https://example.com/page/{i}" for i in range(1, 51)]
results = asyncio.run(scrape_urls(urls, proxy_url))
successful = [r for r in results if not isinstance(r, Exception)]
print(f"Scraped {len(successful)}/{len(urls)} pages")

与流行的 Python 库合并

根据请求(会议)使用

import requests
session = requests.Session()
session.proxies = {
    "http": "http://user:pass@gate.proxyhat.com:8080",
    "https": "http://user:pass@gate.proxyhat.com:8080",
}
session.headers.update({
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
})
# All requests in this session use the proxy
response = session.get("https://example.com/api/products")
print(response.json())

使用 httpx 软件

import httpx
proxy_url = "http://user:pass@gate.proxyhat.com:8080"
# Synchronous
with httpx.Client(proxy=proxy_url) as client:
    response = client.get("https://httpbin.org/ip")
    print(response.json())
# Asynchronous
async with httpx.AsyncClient(proxy=proxy_url) as client:
    response = await client.get("https://httpbin.org/ip")
    print(response.json())

使用 aiohttp

import aiohttp
import asyncio
async def fetch_with_aiohttp():
    proxy_url = "http://user:pass@gate.proxyhat.com:8080"
    async with aiohttp.ClientSession() as session:
        async with session.get(
            "https://httpbin.org/ip",
            proxy=proxy_url,
            timeout=aiohttp.ClientTimeout(total=30)
        ) as response:
            data = await response.json()
            print(f"IP: {data['origin']}")
asyncio.run(fetch_with_aiohttp())

使用 Scrapy

通过配置到您的 Scrapy 蜘蛛中 settings.py编号 :

# settings.py
DOWNLOADER_MIDDLEWARES = {
    "scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware": 110,
}
HTTP_PROXY = "http://user:pass@gate.proxyhat.com:8080"
# Or set per-request in your spider:
import scrapy
class ProductSpider(scrapy.Spider):
    name = "products"
    start_urls = ["https://example.com/products"]
    def start_requests(self):
        for url in self.start_urls:
            yield scrapy.Request(
                url,
                meta={"proxy": "http://user:pass@gate.proxyhat.com:8080"},
                callback=self.parse
            )
    def parse(self, response):
        for product in response.css(".product-card"):
            yield {
                "name": product.css("h2::text").get(),
                "price": product.css(".price::text").get(),
            }

制作提示

连接集合和超时

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
session = requests.Session()
# Configure retry strategy
retry_strategy = Retry(
    total=3,
    backoff_factor=1,
    status_forcelist=[429, 500, 502, 503, 504],
)
adapter = HTTPAdapter(
    max_retries=retry_strategy,
    pool_connections=10,
    pool_maxsize=20
)
session.mount("http://", adapter)
session.mount("https://", adapter)
session.proxies = {
    "http": "http://user:pass@gate.proxyhat.com:8080",
    "https": "http://user:pass@gate.proxyhat.com:8080",
}
# Robust, production-ready request
response = session.get("https://example.com/data", timeout=(5, 30))
print(response.status_code)

伐木和监测

import logging
import time
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(message)s")
logger = logging.getLogger("scraper")
def monitored_request(session, url):
    """Log request timing and status for monitoring."""
    start = time.time()
    try:
        response = session.get(url, timeout=30)
        elapsed = time.time() - start
        logger.info(f"OK {response.status_code} {url} ({elapsed:.2f}s)")
        return response
    except Exception as e:
        elapsed = time.time() - start
        logger.error(f"FAIL {url} ({elapsed:.2f}s): {e}")
        raise

证书的环境变量

从来没有硬码证书。 使用环境变量 :

import os
from proxyhat import ProxyHat
client = ProxyHat(
    api_key=os.environ["PROXYHAT_API_KEY"]
)
# Or with raw proxy URL
proxy_url = os.environ.get(
    "PROXY_URL",
    "http://user:pass@gate.proxyhat.com:8080"
)

欲了解现有代理计划和交通选择的完整清单,请访问 定价页面。关于高级使用案例和终点参考,见 API 文档。您也可以探索我们的指南 2026年网络刮刮的最佳代理 用于比较提供者。

关键外卖

  • 在一个命令中安装 : pip install proxyhat requests 让你立刻开始。
  • 简单化使用 SDK : 代理Hat Python SDK处理认证,重试,自动旋转.
  • 选择正确的代理类型 : 住宅用于刮刮,数据中心用于速度,移动用于社交平台.
  • 旋转对杆 : 采用旋转的代理方式进行散装刮刮,粘度会话进行多步工作流程.
  • 需要时的地理目标: 指定用于本地化数据收集的国家和城市。
  • 正确处理错误 : 为生产可靠性实施指数倒置和重试逻辑.
  • 用货币缩放 : 使用 ThreadPoolExecutorasyncio 与请求平行。
  • 从不硬码证书 : 在环境变量中存储 API 密钥 。

经常被问到的问题

我如何在 Python 请求中设置代理?

传一个 proxies 任意字典 requests 方法 : requests.get(url, proxies={"http": "http://user:pass@host:port", "https": "http://user:pass@host:port"})。代理Hat SDK通过内部处理代理配置来进一步简化。

在Python中旋转和粘附的代理有什么区别?

旋转代理为每个请求指定一个新的IP地址,这是大规模刮刮的理想. 粘附的代理在设定的时间内(如10-30分钟)维持相同的IP,这是登录会话,购物车,或者在IP一致性重要的地方加固浏览所必需的.

我能用代理代用Ayncio和Aiohttp吗?

对 代理Hat代理与任何支持代理配置的 HTTP 客户端合作,包括 aiohttp, (中文). httpx (同步模式),和 asyncio- 基于框架。 通过代理 URL 作为 proxy 参数在您的 Aync 客户端中。

我怎么处理Python的代理错误和超时?

在尝试/ 除块外抓取您的请求 ProxyError, (中文). Timeout,以及 ConnectionError。执行指数倒计时(在重试之间重复等待时间)并设定最大重试数。 代理Hat SDK包含内置的带有可配置参数的重试逻辑.

哪个 Python 库最适合用代理来刮网?

对于简单的任务, requests 使用代理Hat SDK是最简单的选项. 用于高通币刮除,使用 httpxaiohttp对于带链接的复杂爬行和数据提取, Scrapy 使用代理中间软件是最强大的选择. 都和ProxyHat代理无缝工作.

准备开始了吗?

通过AI过滤访问148多个国家的5000多万个住宅IP。

查看价格住宅代理
← 返回博客