为什么在Python中使用代理?
Python主导数据提取景观. 图书馆 请求, (中文). httpx 数据,以及 摇摆 使HTTP调用微不足道,但没有代理,您的脚本在几分钟内击中IP禁用. 使用 Python 中的代理 让您旋转 IP 地址, 绕过地理限制, 并可靠地缩放您的刮切操作 。
在此指南中,您将学习如何使用 代理汉字 Python SDK 和标准 requests 图书馆。 每个区段都包含可立即运行的复制版准备代码.
你是否正在建立一个 网络刮削管道,监测 应急方案成果,或收集定价数据,本指南涵盖认证,代理旋转,地理目标设定,错误处理,以及生产规模.
安装和设置
安装代理Hat SDK和请求
安装代理Hat Python SDK 和 requests 使用 pip 的库 :
pip install proxyhat requests对于自动工作流程,也安装 httpx 和 aiohttp编号 :
pip install httpx aiohttp获取您的 API 证书
注册到 代理汉特 从仪表板上取回你的API密钥. 你会需要你的 用户名 和 密码 (或API密钥)用于代理认证. 可在 代理Hat API 文档。 。 。
认证和基本配置
使用代理Hat SDK
SDK为您处理认证、旋转和连接管理:
from proxyhat import ProxyHat
client = ProxyHat(
api_key="your_api_key_here"
)
# Test the connection
info = client.get_account_info()
print(f"Traffic remaining: {info['traffic_remaining']} GB")使用带请求的 Raw 代理证书
如果你喜欢使用 requests 直接配置代理 URL :
import requests
proxy_url = "http://username:password@gate.proxyhat.com:8080"
proxies = {
"http": proxy_url,
"https": proxy_url,
}
response = requests.get(
"https://httpbin.org/ip",
proxies=proxies,
timeout=30
)
print(response.json())
# {"origin": "185.xxx.xxx.xxx"}简单 使用代理服务器请求
以下是通过代理Hat住宅代理发送GET请求的完整实例:
from proxyhat import ProxyHat
client = ProxyHat(api_key="your_api_key_here")
# Make a proxied GET request
response = client.get("https://httpbin.org/ip")
print(f"Status: {response.status_code}")
print(f"IP: {response.json()['origin']}")
print(f"Headers: {response.headers}")或者用标准 requests 库 :
import requests
proxies = {
"http": "http://user:pass@gate.proxyhat.com:8080",
"https": "http://user:pass@gate.proxyhat.com:8080",
}
response = requests.get(
"https://example.com/api/data",
proxies=proxies,
timeout=30,
headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"}
)
print(response.status_code)
print(response.text[:500])选择正确的代理类型
代理Hat提供三种代理类型. 根据您的使用大小写选择 。 更深入的比较,请读我们的指南 住宅对数据中心对移动代理。 。 。
| 类型 | 最佳时间 | 速度 | 检测风险 | 费用 |
|---|---|---|---|---|
| 住所 | 网络刮刮,SERP跟踪 | 中型 | 极低 | 按GB计 |
| 数据中心 | 量大、速度关键的任务 | 快点 | 高级 | 每个实施伙伴/月 |
| 移动 | 社交媒体,应用测试 | 中型 | 最低 | 按GB计 |
在代码中切换代理类型
from proxyhat import ProxyHat
client = ProxyHat(api_key="your_api_key_here")
# Residential proxy (default)
response = client.get(
"https://example.com",
proxy_type="residential"
)
# Datacenter proxy
response = client.get(
"https://example.com",
proxy_type="datacenter"
)
# Mobile proxy
response = client.get(
"https://example.com",
proxy_type="mobile"
)旋转对粘性会话
旋转代理 为每个请求指定一个新的IP,在您需要最大匿名的情况下进行大规模刮刮的理想。 粘附会议 在设定的时间内保持相同的IP,对于登录序列或帕格化导航等多步骤工作流程至关重要.
旋转代理( 每个请求新IP)
from proxyhat import ProxyHat
client = ProxyHat(api_key="your_api_key_here")
urls = [
"https://httpbin.org/ip",
"https://httpbin.org/ip",
"https://httpbin.org/ip",
]
for url in urls:
response = client.get(url, session_type="rotating")
print(f"IP: {response.json()['origin']}")
# Each request uses a different IP:
# IP: 185.xxx.xxx.1
# IP: 92.xxx.xxx.47
# IP: 78.xxx.xxx.203粘贴会话( 持续时间相同 IP)
from proxyhat import ProxyHat
client = ProxyHat(api_key="your_api_key_here")
# Create a sticky session (maintains IP for up to 30 minutes)
session = client.create_session(duration_minutes=30)
# All requests in this session use the same IP
for page in range(1, 6):
response = session.get(f"https://example.com/products?page={page}")
print(f"Page {page}: IP {response.headers.get('X-Proxy-IP')}")
# Same IP across all pages:
# Page 1: IP 185.xxx.xxx.42
# Page 2: IP 185.xxx.xxx.42
# Page 3: IP 185.xxx.xxx.42地理目标请求
需要特定国家的数据吗? 代理汉特支持 195+地点的地理目标这对于本地化的SERP刮刮、价格监测和内容核查至关重要。
from proxyhat import ProxyHat
client = ProxyHat(api_key="your_api_key_here")
# Target a specific country
response = client.get(
"https://www.google.com/search?q=best+restaurants",
country="US"
)
# Target a specific city
response = client.get(
"https://www.google.com/search?q=best+restaurants",
country="US",
city="New York"
)
# Using raw proxy URL with geo-targeting
# Format: username-country-US:password@gate.proxyhat.com:8080
import requests
proxies = {
"http": "http://user-country-DE:pass@gate.proxyhat.com:8080",
"https": "http://user-country-DE:pass@gate.proxyhat.com:8080",
}
response = requests.get("https://www.google.de", proxies=proxies, timeout=30)
print(f"Accessed from Germany: {response.status_code}")处理和检索错误
网络请求失败 。 临近超时. 目标挡住你 强力错误处理将生产刮刮机与玩具脚本分开.
基本重试逻辑
import time
import requests
from requests.exceptions import ProxyError, Timeout, ConnectionError
def fetch_with_retry(url, proxies, max_retries=3, timeout=30):
"""Fetch a URL with automatic retry on failure."""
for attempt in range(max_retries):
try:
response = requests.get(
url,
proxies=proxies,
timeout=timeout,
headers={"User-Agent": "Mozilla/5.0"}
)
response.raise_for_status()
return response
except (ProxyError, Timeout, ConnectionError) as e:
wait = 2 ** attempt # Exponential backoff
print(f"Attempt {attempt + 1} failed: {e}. Retrying in {wait}s...")
time.sleep(wait)
except requests.exceptions.HTTPError as e:
if e.response.status_code == 429:
wait = 10 * (attempt + 1)
print(f"Rate limited. Waiting {wait}s...")
time.sleep(wait)
elif e.response.status_code >= 500:
time.sleep(2 ** attempt)
else:
raise
raise Exception(f"Failed to fetch {url} after {max_retries} attempts")
# Usage
proxies = {
"http": "http://user:pass@gate.proxyhat.com:8080",
"https": "http://user:pass@gate.proxyhat.com:8080",
}
response = fetch_with_retry("https://example.com/data", proxies)使用 SDK 的内置重试
from proxyhat import ProxyHat
client = ProxyHat(
api_key="your_api_key_here",
max_retries=3,
timeout=30,
retry_on_status=[429, 500, 502, 503]
)
# The SDK handles retries automatically
response = client.get("https://example.com/data")
print(response.status_code)与线索同时擦拭
顺序请求缓慢. 对于生产工作量,使用Python的 concurrent.futures 通过代理并行请求。
from concurrent.futures import ThreadPoolExecutor, as_completed
from proxyhat import ProxyHat
client = ProxyHat(api_key="your_api_key_here")
urls = [
"https://example.com/product/1",
"https://example.com/product/2",
"https://example.com/product/3",
"https://example.com/product/4",
"https://example.com/product/5",
]
def scrape(url):
"""Scrape a single URL through the proxy."""
response = client.get(url, proxy_type="residential")
return {"url": url, "status": response.status_code, "length": len(response.text)}
# Run 5 concurrent requests
results = []
with ThreadPoolExecutor(max_workers=5) as executor:
futures = {executor.submit(scrape, url): url for url in urls}
for future in as_completed(futures):
try:
result = future.result()
results.append(result)
print(f"OK: {result['url']} ({result['length']} bytes)")
except Exception as e:
print(f"Error: {futures[future]} - {e}")
print(f"\nCompleted: {len(results)}/{len(urls)}")与 Ayncio 和 httpx 一起拼写
import asyncio
import httpx
async def scrape_urls(urls, proxy_url, max_concurrent=10):
"""Scrape multiple URLs concurrently using async proxies."""
semaphore = asyncio.Semaphore(max_concurrent)
async def fetch(client, url):
async with semaphore:
response = await client.get(url, timeout=30)
return {"url": url, "status": response.status_code}
async with httpx.AsyncClient(proxy=proxy_url) as client:
tasks = [fetch(client, url) for url in urls]
return await asyncio.gather(*tasks, return_exceptions=True)
# Usage
proxy_url = "http://user:pass@gate.proxyhat.com:8080"
urls = [f"https://example.com/page/{i}" for i in range(1, 51)]
results = asyncio.run(scrape_urls(urls, proxy_url))
successful = [r for r in results if not isinstance(r, Exception)]
print(f"Scraped {len(successful)}/{len(urls)} pages")与流行的 Python 库合并
根据请求(会议)使用
import requests
session = requests.Session()
session.proxies = {
"http": "http://user:pass@gate.proxyhat.com:8080",
"https": "http://user:pass@gate.proxyhat.com:8080",
}
session.headers.update({
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
})
# All requests in this session use the proxy
response = session.get("https://example.com/api/products")
print(response.json())使用 httpx 软件
import httpx
proxy_url = "http://user:pass@gate.proxyhat.com:8080"
# Synchronous
with httpx.Client(proxy=proxy_url) as client:
response = client.get("https://httpbin.org/ip")
print(response.json())
# Asynchronous
async with httpx.AsyncClient(proxy=proxy_url) as client:
response = await client.get("https://httpbin.org/ip")
print(response.json())使用 aiohttp
import aiohttp
import asyncio
async def fetch_with_aiohttp():
proxy_url = "http://user:pass@gate.proxyhat.com:8080"
async with aiohttp.ClientSession() as session:
async with session.get(
"https://httpbin.org/ip",
proxy=proxy_url,
timeout=aiohttp.ClientTimeout(total=30)
) as response:
data = await response.json()
print(f"IP: {data['origin']}")
asyncio.run(fetch_with_aiohttp())使用 Scrapy
通过配置到您的 Scrapy 蜘蛛中 settings.py编号 :
# settings.py
DOWNLOADER_MIDDLEWARES = {
"scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware": 110,
}
HTTP_PROXY = "http://user:pass@gate.proxyhat.com:8080"
# Or set per-request in your spider:
import scrapy
class ProductSpider(scrapy.Spider):
name = "products"
start_urls = ["https://example.com/products"]
def start_requests(self):
for url in self.start_urls:
yield scrapy.Request(
url,
meta={"proxy": "http://user:pass@gate.proxyhat.com:8080"},
callback=self.parse
)
def parse(self, response):
for product in response.css(".product-card"):
yield {
"name": product.css("h2::text").get(),
"price": product.css(".price::text").get(),
}制作提示
连接集合和超时
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
session = requests.Session()
# Configure retry strategy
retry_strategy = Retry(
total=3,
backoff_factor=1,
status_forcelist=[429, 500, 502, 503, 504],
)
adapter = HTTPAdapter(
max_retries=retry_strategy,
pool_connections=10,
pool_maxsize=20
)
session.mount("http://", adapter)
session.mount("https://", adapter)
session.proxies = {
"http": "http://user:pass@gate.proxyhat.com:8080",
"https": "http://user:pass@gate.proxyhat.com:8080",
}
# Robust, production-ready request
response = session.get("https://example.com/data", timeout=(5, 30))
print(response.status_code)伐木和监测
import logging
import time
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(message)s")
logger = logging.getLogger("scraper")
def monitored_request(session, url):
"""Log request timing and status for monitoring."""
start = time.time()
try:
response = session.get(url, timeout=30)
elapsed = time.time() - start
logger.info(f"OK {response.status_code} {url} ({elapsed:.2f}s)")
return response
except Exception as e:
elapsed = time.time() - start
logger.error(f"FAIL {url} ({elapsed:.2f}s): {e}")
raise证书的环境变量
从来没有硬码证书。 使用环境变量 :
import os
from proxyhat import ProxyHat
client = ProxyHat(
api_key=os.environ["PROXYHAT_API_KEY"]
)
# Or with raw proxy URL
proxy_url = os.environ.get(
"PROXY_URL",
"http://user:pass@gate.proxyhat.com:8080"
)欲了解现有代理计划和交通选择的完整清单,请访问 定价页面。关于高级使用案例和终点参考,见 API 文档。您也可以探索我们的指南 2026年网络刮刮的最佳代理 用于比较提供者。
关键外卖
- 在一个命令中安装 :
pip install proxyhat requests让你立刻开始。- 简单化使用 SDK : 代理Hat Python SDK处理认证,重试,自动旋转.
- 选择正确的代理类型 : 住宅用于刮刮,数据中心用于速度,移动用于社交平台.
- 旋转对杆 : 采用旋转的代理方式进行散装刮刮,粘度会话进行多步工作流程.
- 需要时的地理目标: 指定用于本地化数据收集的国家和城市。
- 正确处理错误 : 为生产可靠性实施指数倒置和重试逻辑.
- 用货币缩放 : 使用
ThreadPoolExecutor或asyncio与请求平行。- 从不硬码证书 : 在环境变量中存储 API 密钥 。
经常被问到的问题
我如何在 Python 请求中设置代理?
传一个 proxies 任意字典 requests 方法 : requests.get(url, proxies={"http": "http://user:pass@host:port", "https": "http://user:pass@host:port"})。代理Hat SDK通过内部处理代理配置来进一步简化。
在Python中旋转和粘附的代理有什么区别?
旋转代理为每个请求指定一个新的IP地址,这是大规模刮刮的理想. 粘附的代理在设定的时间内(如10-30分钟)维持相同的IP,这是登录会话,购物车,或者在IP一致性重要的地方加固浏览所必需的.
我能用代理代用Ayncio和Aiohttp吗?
对 代理Hat代理与任何支持代理配置的 HTTP 客户端合作,包括 aiohttp, (中文). httpx (同步模式),和 asyncio- 基于框架。 通过代理 URL 作为 proxy 参数在您的 Aync 客户端中。
我怎么处理Python的代理错误和超时?
在尝试/ 除块外抓取您的请求 ProxyError, (中文). Timeout,以及 ConnectionError。执行指数倒计时(在重试之间重复等待时间)并设定最大重试数。 代理Hat SDK包含内置的带有可配置参数的重试逻辑.
哪个 Python 库最适合用代理来刮网?
对于简单的任务, requests 使用代理Hat SDK是最简单的选项. 用于高通币刮除,使用 httpx 或 aiohttp对于带链接的复杂爬行和数据提取, Scrapy 使用代理中间软件是最强大的选择. 都和ProxyHat代理无缝工作.






