Dlaczego monitoruj wydajność proxy
Infrastruktura Proxy zawodzi po cichu. Twój drapacz może działać godzinami z 40% powodzeniem, zanim ktoś zauważy. Czas reakcji się pogarsza, zwiększa się liczba bloków i pogarsza jakość danych - wszystko to nie powoduje oczywistych błędów. Monitoring zamienia te niewidoczne problemy w alarmy dające się zaskarżyć.
Ten przewodnik pokazuje, jak przystosować swoje żądania proxy, zbierać znaczące wskaźniki, budować deski rozdzielcze i ustawić ostrzeganie, że połowów degradacji zanim wpłynie na gazociąg danych. Wszystkie przykłady ProxyHat 's residential proxy i są gotowe do produkcji.
Jeśli nie mierzysz swojej wydajności proxy, zgadujesz. Zgadywanie na skalę kosztuje pieniądze i produkuje niewiarygodne dane.
Key Metrics to Track
| Metric | Co ci mówi? | Próg ostrzegania |
|---|---|---|
| Wskaźnik sukcesu | Procent wniosków zwracających status 2xx | Poniżej 90% |
| Opóźnienie odpowiedzi (p50 / p95 / p99) | Jak szybko proxied żądania ukończone | p95 powyżej 10 s |
| Wskaźnik błędu według typu | Które błędy dominują (timeout, 403, 429, connection) | Każdy typ powyżej 15% |
| Wnioski na sekundę | Through of your scrating pipes | Poniżej przewidywanej wartości wyjściowej |
| Zastosowanie szerokości pasma | Dane przekazywane za pośrednictwem pośrednika | Limit planu zbliżania się |
| Stopa bloku według docelowej | Które cele blokują cię najbardziej | Powyżej 20% dla każdego celu |
| Stopa zwrotu | Ile żądań potrzebuje powtórzeń | Ponad 30% |
| Efektywność ponownego wykorzystania sesji | Jak długo lepkie sesje przetrwać | Poniżej 5 wniosków średnio |
Python: Instrumented Proxy Client
Klient zawija każdą prośbę w czas, śledzenie stanu i ustrukturyzowane logowanie.
import time
import uuid
import logging
import statistics
from dataclasses import dataclass, field
from collections import defaultdict
from typing import Optional
import requests
logger = logging.getLogger("proxy_monitor")
@dataclass
class ProxyMetrics:
"""Collects and aggregates proxy performance metrics."""
total_requests: int = 0
successful: int = 0
failed: int = 0
retries: int = 0
latencies: list = field(default_factory=list)
status_codes: dict = field(default_factory=lambda: defaultdict(int))
errors_by_type: dict = field(default_factory=lambda: defaultdict(int))
bytes_transferred: int = 0
requests_by_target: dict = field(default_factory=lambda: defaultdict(lambda: {"success": 0, "failed": 0}))
@property
def success_rate(self) -> float:
return (self.successful / self.total_requests * 100) if self.total_requests > 0 else 0.0
@property
def p50_latency(self) -> float:
return statistics.median(self.latencies) if self.latencies else 0.0
@property
def p95_latency(self) -> float:
if not self.latencies:
return 0.0
sorted_lat = sorted(self.latencies)
idx = int(len(sorted_lat) * 0.95)
return sorted_lat[min(idx, len(sorted_lat) - 1)]
@property
def p99_latency(self) -> float:
if not self.latencies:
return 0.0
sorted_lat = sorted(self.latencies)
idx = int(len(sorted_lat) * 0.99)
return sorted_lat[min(idx, len(sorted_lat) - 1)]
def summary(self) -> dict:
return {
"total_requests": self.total_requests,
"success_rate": f"{self.success_rate:.1f}%",
"p50_latency": f"{self.p50_latency:.3f}s",
"p95_latency": f"{self.p95_latency:.3f}s",
"p99_latency": f"{self.p99_latency:.3f}s",
"retries": self.retries,
"bytes_transferred": self.bytes_transferred,
"top_errors": dict(sorted(
self.errors_by_type.items(),
key=lambda x: x[1], reverse=True
)[:5]),
"status_distribution": dict(self.status_codes),
}
class MonitoredProxyClient:
"""HTTP client with built-in proxy monitoring."""
def __init__(self, max_retries: int = 3):
self.metrics = ProxyMetrics()
self.max_retries = max_retries
self._alert_callbacks = []
def on_alert(self, callback):
"""Register a callback for metric alerts."""
self._alert_callbacks.append(callback)
def _check_alerts(self):
if self.metrics.total_requests < 10:
return
alerts = []
if self.metrics.success_rate < 90:
alerts.append(f"Low success rate: {self.metrics.success_rate:.1f}%")
if self.metrics.p95_latency > 10:
alerts.append(f"High p95 latency: {self.metrics.p95_latency:.1f}s")
if self.metrics.retries / max(self.metrics.total_requests, 1) > 0.3:
alerts.append(f"High retry rate: {self.metrics.retries}/{self.metrics.total_requests}")
for alert in alerts:
logger.warning(f"ALERT: {alert}")
for cb in self._alert_callbacks:
cb(alert)
def fetch(self, url: str, country: Optional[str] = None) -> Optional[requests.Response]:
from urllib.parse import urlparse
target_domain = urlparse(url).netloc
for attempt in range(self.max_retries + 1):
session_id = uuid.uuid4().hex[:8]
username = f"USERNAME-session-{session_id}"
if country:
username += f"-country-{country}"
proxy = f"http://{username}:PASSWORD@gate.proxyhat.com:8080"
self.metrics.total_requests += 1
if attempt > 0:
self.metrics.retries += 1
start = time.time()
try:
response = requests.get(
url,
proxies={"http": proxy, "https": proxy},
timeout=30,
)
latency = time.time() - start
self.metrics.latencies.append(latency)
self.metrics.status_codes[response.status_code] += 1
if response.status_code >= 400:
self.metrics.errors_by_type[f"HTTP_{response.status_code}"] += 1
self.metrics.requests_by_target[target_domain]["failed"] += 1
if response.status_code in (403, 429, 503) and attempt < self.max_retries:
time.sleep(2 ** attempt)
continue
self.metrics.failed += 1
else:
self.metrics.successful += 1
self.metrics.bytes_transferred += len(response.content)
self.metrics.requests_by_target[target_domain]["success"] += 1
self._check_alerts()
return response
except requests.exceptions.Timeout:
self.metrics.errors_by_type["timeout"] += 1
self.metrics.latencies.append(time.time() - start)
self.metrics.requests_by_target[target_domain]["failed"] += 1
except requests.exceptions.ConnectionError:
self.metrics.errors_by_type["connection_error"] += 1
self.metrics.latencies.append(time.time() - start)
self.metrics.requests_by_target[target_domain]["failed"] += 1
except Exception as e:
self.metrics.errors_by_type[type(e).__name__] += 1
self.metrics.latencies.append(time.time() - start)
if attempt < self.max_retries:
time.sleep(2 ** attempt)
self.metrics.failed += 1
self._check_alerts()
return None
# Usage
client = MonitoredProxyClient(max_retries=3)
client.on_alert(lambda msg: print(f"[ALERT] {msg}"))
urls = [f"https://example.com/product/{i}" for i in range(100)]
for url in urls:
response = client.fetch(url)
print(client.metrics.summary())
Node.js: Zinstrumentowany klient proxy
const crypto = require('crypto');
const { HttpsProxyAgent } = require('https-proxy-agent');
const { EventEmitter } = require('events');
class ProxyMetrics {
constructor() {
this.totalRequests = 0;
this.successful = 0;
this.failed = 0;
this.retries = 0;
this.latencies = [];
this.statusCodes = {};
this.errorsByType = {};
this.bytesTransferred = 0;
this.requestsByTarget = {};
}
get successRate() {
return this.totalRequests > 0
? ((this.successful / this.totalRequests) * 100).toFixed(1)
: '0.0';
}
percentile(p) {
if (this.latencies.length === 0) return 0;
const sorted = [...this.latencies].sort((a, b) => a - b);
const idx = Math.min(
Math.floor(sorted.length * (p / 100)),
sorted.length - 1
);
return sorted[idx];
}
summary() {
return {
totalRequests: this.totalRequests,
successRate: `${this.successRate}%`,
p50Latency: `${this.percentile(50).toFixed(3)}s`,
p95Latency: `${this.percentile(95).toFixed(3)}s`,
p99Latency: `${this.percentile(99).toFixed(3)}s`,
retries: this.retries,
bytesTransferred: this.bytesTransferred,
statusDistribution: { ...this.statusCodes },
topErrors: Object.entries(this.errorsByType)
.sort(([, a], [, b]) => b - a)
.slice(0, 5)
.reduce((obj, [k, v]) => ({ ...obj, [k]: v }), {}),
};
}
}
class MonitoredProxyClient extends EventEmitter {
constructor({ maxRetries = 3 } = {}) {
super();
this.metrics = new ProxyMetrics();
this.maxRetries = maxRetries;
}
_checkAlerts() {
if (this.metrics.totalRequests < 10) return;
if (parseFloat(this.metrics.successRate) < 90) {
this.emit('alert', `Low success rate: ${this.metrics.successRate}%`);
}
if (this.metrics.percentile(95) > 10) {
this.emit('alert', `High p95 latency: ${this.metrics.percentile(95).toFixed(1)}s`);
}
}
async fetch(url, { country } = {}) {
const targetDomain = new URL(url).hostname;
for (let attempt = 0; attempt <= this.maxRetries; attempt++) {
const sessionId = crypto.randomBytes(4).toString('hex');
let username = `USERNAME-session-${sessionId}`;
if (country) username += `-country-${country}`;
const agent = new HttpsProxyAgent(
`http://${username}:PASSWORD@gate.proxyhat.com:8080`
);
this.metrics.totalRequests++;
if (attempt > 0) this.metrics.retries++;
const startTime = Date.now();
try {
const response = await fetch(url, {
agent,
signal: AbortSignal.timeout(30000),
});
const latency = (Date.now() - startTime) / 1000;
this.metrics.latencies.push(latency);
this.metrics.statusCodes[response.status] =
(this.metrics.statusCodes[response.status] || 0) + 1;
if (response.status >= 400) {
this.metrics.errorsByType[`HTTP_${response.status}`] =
(this.metrics.errorsByType[`HTTP_${response.status}`] || 0) + 1;
if ([403, 429, 503].includes(response.status) && attempt < this.maxRetries) {
await new Promise(r => setTimeout(r, 1000 * Math.pow(2, attempt)));
continue;
}
this.metrics.failed++;
} else {
this.metrics.successful++;
const body = await response.text();
this.metrics.bytesTransferred += body.length;
}
this._checkAlerts();
return response;
} catch (err) {
const latency = (Date.now() - startTime) / 1000;
this.metrics.latencies.push(latency);
this.metrics.errorsByType[err.name] =
(this.metrics.errorsByType[err.name] || 0) + 1;
if (attempt < this.maxRetries) {
await new Promise(r => setTimeout(r, 1000 * Math.pow(2, attempt)));
continue;
}
this.metrics.failed++;
}
}
this._checkAlerts();
return null;
}
}
// Usage
const client = new MonitoredProxyClient({ maxRetries: 3 });
client.on('alert', msg => console.warn(`[ALERT] ${msg}`));
const urls = Array.from({ length: 100 }, (_, i) =>
`https://example.com/product/${i + 1}`
);
for (const url of urls) {
await client.fetch(url);
}
console.log(client.metrics.summary());
Idź: Instrumentowany Klient Proxy
package main
import (
"crypto/rand"
"encoding/hex"
"fmt"
"io"
"math"
"net/http"
"net/url"
"sort"
"sync"
"time"
)
type Metrics struct {
mu sync.Mutex
TotalRequests int
Successful int
Failed int
Retries int
Latencies []float64
StatusCodes map[int]int
ErrorsByType map[string]int
BytesTransferred int64
}
func NewMetrics() *Metrics {
return &Metrics{
StatusCodes: make(map[int]int),
ErrorsByType: make(map[string]int),
}
}
func (m *Metrics) RecordSuccess(latency float64, status int, bytes int) {
m.mu.Lock()
defer m.mu.Unlock()
m.TotalRequests++
m.Successful++
m.Latencies = append(m.Latencies, latency)
m.StatusCodes[status]++
m.BytesTransferred += int64(bytes)
}
func (m *Metrics) RecordFailure(latency float64, errType string) {
m.mu.Lock()
defer m.mu.Unlock()
m.TotalRequests++
m.Failed++
m.Latencies = append(m.Latencies, latency)
m.ErrorsByType[errType]++
}
func (m *Metrics) Percentile(p float64) float64 {
m.mu.Lock()
defer m.mu.Unlock()
if len(m.Latencies) == 0 {
return 0
}
sorted := make([]float64, len(m.Latencies))
copy(sorted, m.Latencies)
sort.Float64s(sorted)
idx := int(math.Min(float64(len(sorted)-1), float64(len(sorted))*p/100))
return sorted[idx]
}
func (m *Metrics) SuccessRate() float64 {
m.mu.Lock()
defer m.mu.Unlock()
if m.TotalRequests == 0 {
return 0
}
return float64(m.Successful) / float64(m.TotalRequests) * 100
}
func (m *Metrics) Summary() string {
return fmt.Sprintf(
"Requests: %d | Success: %.1f%% | p50: %.3fs | p95: %.3fs | p99: %.3fs | Retries: %d",
m.TotalRequests, m.SuccessRate(),
m.Percentile(50), m.Percentile(95), m.Percentile(99),
m.Retries,
)
}
type MonitoredClient struct {
metrics *Metrics
maxRetries int
}
func NewMonitoredClient(maxRetries int) *MonitoredClient {
return &MonitoredClient{
metrics: NewMetrics(),
maxRetries: maxRetries,
}
}
func (c *MonitoredClient) Fetch(target string) (*http.Response, error) {
for attempt := 0; attempt <= c.maxRetries; attempt++ {
b := make([]byte, 4)
rand.Read(b)
sessionID := hex.EncodeToString(b)
proxyStr := fmt.Sprintf(
"http://USERNAME-session-%s:PASSWORD@gate.proxyhat.com:8080",
sessionID,
)
proxyURL, _ := url.Parse(proxyStr)
client := &http.Client{
Transport: &http.Transport{Proxy: http.ProxyURL(proxyURL)},
Timeout: 30 * time.Second,
}
if attempt > 0 {
c.metrics.mu.Lock()
c.metrics.Retries++
c.metrics.mu.Unlock()
}
start := time.Now()
resp, err := client.Get(target)
latency := time.Since(start).Seconds()
if err != nil {
c.metrics.RecordFailure(latency, "connection_error")
if attempt < c.maxRetries {
time.Sleep(time.Duration(math.Pow(2, float64(attempt))) * time.Second)
continue
}
return nil, err
}
body, _ := io.ReadAll(resp.Body)
resp.Body.Close()
if resp.StatusCode >= 400 {
c.metrics.RecordFailure(latency, fmt.Sprintf("HTTP_%d", resp.StatusCode))
if attempt < c.maxRetries {
time.Sleep(time.Duration(math.Pow(2, float64(attempt))) * time.Second)
continue
}
} else {
c.metrics.RecordSuccess(latency, resp.StatusCode, len(body))
}
return resp, nil
}
return nil, fmt.Errorf("all retries exhausted for %s", target)
}
func main() {
client := NewMonitoredClient(3)
for i := 0; i < 50; i++ {
url := fmt.Sprintf("https://example.com/product/%d", i+1)
client.Fetch(url)
}
fmt.Println(client.metrics.Summary())
}
Strukturyzowane logowanie dla zamówień proxy
Zorganizowane dzienniki JSONułatwiają agregację i analizę wydajności proxy w rozproszonych scrappers.
import json
import logging
import time
import uuid
import requests
class JSONProxyLogger:
"""Logs every proxy request as structured JSON."""
def __init__(self, log_file: str = "proxy_requests.jsonl"):
self.logger = logging.getLogger("proxy_json")
handler = logging.FileHandler(log_file)
handler.setFormatter(logging.Formatter("%(message)s"))
self.logger.addHandler(handler)
self.logger.setLevel(logging.INFO)
def log_request(self, entry: dict):
self.logger.info(json.dumps(entry))
def fetch(self, url: str, country: str = None) -> requests.Response:
session_id = uuid.uuid4().hex[:8]
username = f"USERNAME-session-{session_id}"
if country:
username += f"-country-{country}"
proxy = f"http://{username}:PASSWORD@gate.proxyhat.com:8080"
start = time.time()
try:
response = requests.get(
url,
proxies={"http": proxy, "https": proxy},
timeout=30,
)
latency = time.time() - start
self.log_request({
"timestamp": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
"url": url,
"status": response.status_code,
"latency_ms": round(latency * 1000),
"bytes": len(response.content),
"session_id": session_id,
"country": country,
"success": response.status_code < 400,
})
return response
except Exception as e:
latency = time.time() - start
self.log_request({
"timestamp": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
"url": url,
"error": str(e),
"error_type": type(e).__name__,
"latency_ms": round(latency * 1000),
"session_id": session_id,
"country": country,
"success": False,
})
raise
# Usage — logs produce JSONL like:
# {"timestamp":"2026-02-26T10:30:00Z","url":"https://...","status":200,"latency_ms":1234,...}
proxy_logger = JSONProxyLogger("proxy_requests.jsonl")
response = proxy_logger.fetch("https://example.com/data", country="us")
Okresowe sprawozdania zdrowotne
W przypadku długotrwających skrobarek generować okresowe raporty zdrowotne, które podsumowują wydajność nad stałymi oknami.
import time
import threading
from datetime import datetime
class PeriodicReporter:
"""Generates periodic performance reports from proxy metrics."""
def __init__(self, metrics: ProxyMetrics, interval_seconds: int = 60):
self.metrics = metrics
self.interval = interval_seconds
self._running = False
self._thread = None
self._last_snapshot = None
def start(self):
self._running = True
self._last_snapshot = self._snapshot()
self._thread = threading.Thread(target=self._report_loop, daemon=True)
self._thread.start()
def stop(self):
self._running = False
def _snapshot(self) -> dict:
return {
"total": self.metrics.total_requests,
"success": self.metrics.successful,
"failed": self.metrics.failed,
"retries": self.metrics.retries,
"time": time.time(),
}
def _report_loop(self):
while self._running:
time.sleep(self.interval)
current = self._snapshot()
prev = self._last_snapshot
elapsed = current["time"] - prev["time"]
requests_delta = current["total"] - prev["total"]
success_delta = current["success"] - prev["success"]
failed_delta = current["failed"] - prev["failed"]
rps = requests_delta / elapsed if elapsed > 0 else 0
window_success_rate = (
(success_delta / requests_delta * 100)
if requests_delta > 0 else 0
)
report = {
"window": f"{self.interval}s",
"timestamp": datetime.utcnow().isoformat(),
"requests": requests_delta,
"rps": round(rps, 1),
"success_rate": f"{window_success_rate:.1f}%",
"failed": failed_delta,
"cumulative_success_rate": f"{self.metrics.success_rate:.1f}%",
"p95_latency": f"{self.metrics.p95_latency:.3f}s",
}
print(f"[REPORT] {report}")
self._last_snapshot = current
# Usage with MonitoredProxyClient
client = MonitoredProxyClient(max_retries=3)
reporter = PeriodicReporter(client.metrics, interval_seconds=30)
reporter.start()
# Scrape away — reports print every 30 seconds
for url in urls:
client.fetch(url)
reporter.stop()
Zasady i progi ostrzegania
Ustawcie inteligentny system ostrzegania, który zapobiega fałszywym pozytywnym skutkom w okresach rozgrzewania i przemijających zamianach.
| Alarm | Stan | Schładzanie | Działanie |
|---|---|---|---|
| Niski wskaźnik sukcesu | Poniżej 90% przez 5 minut | 10 min | Zbadaj bloki docelowe, sprawdź pulę proxy |
| Wysoka trwałość | p95 powyżej 10 s nad oknem 2-minutowym | 5 min | Ogranicz współzależność, sprawdź docelowy poziom zdrowia |
| Błąd Spike | Pojedynczy typ błędu przekracza 20% wniosków | 5 min | Sprawdź, czy cel się zmienił, obróć położenie geo |
| Szerokość pasma Spike | Szybkość transferu w stosunku do wartości wyjściowej | 15 min | Sprawdź spodziewane zachowanie, sprawdź czy nie ma pętli przekierowania |
| Zero progów | Brak udanych wniosków w ciągu 2 minut | 2 min | Sprawdź łączność proxy, weryfikuj referencje |
Dobrym monitoringiem jest różnica między rurociągiem do rozdrabniania, który działa niezawodnie przez miesiące, a rurociągiem, który po cichu produkuje dane o odpadach. Zainwestuj w oprzyrządowanie z góry - to samo płaci się za pierwszy incydent produkcji można złapać wcześniej.
Aby zbudować zastawy, które zasilają te mierniki, zobacz Building a Proxy Middleware LayerDo optymalizacji przepustowości wraz z monitorowaniem, czytaj Skalowanie żądań proxy za pomocą kontroli konkursowej. Pełny projekt systemu, zobacz Projektowanie niezawodnej architektury skraplania.
Zbadaj Python SDK, Węzeł SDKoraz Go SDK dla integracji proxy, lub sprawdzić Ceny proksyHat oraz dokumentacja Na początek.






