Se stai sviluppando scraper, integrando API di terze parti o automatizzando processi in PHP, prima o poi ti scontrerai con limiti di rate, blocchi IP o necessità di geo-targeting. I proxy HTTP risolvono questi problemi permettendoti di instradare le richieste attraverso IP diversi. In questa guida esploreremo come implementare proxy HTTP in PHP utilizzando cURL nativo, Guzzle, Symfony HTTP Client e Laravel, con esempi di codice pronti per la produzione.
Perché Usare Proxy HTTP in PHP
Quando effettui richieste HTTP ripetute verso lo stesso endpoint, il server destinatario può bloccare il tuo IP per eccesso di richieste. I proxy HTTP risolvono questo problema instradando il traffico attraverso IP intermedi. Per gli sviluppatori PHP, questo è particolarmente rilevante per:
- Web scraping: evitare blocchi IP durante l'estrazione di dati
- Integrazioni API: rispettare rate limit distribuiti su più IP
- Testing geografico: verificare contenuti localizzati da diversi paesi
- Automazione: eseguire job in parallelo senza conflitti IP
La scelta del tipo di proxy — residential, datacenter o mobile — dipende dal caso d'uso. I proxy residential offrono la massima affidabilità per lo scraping perché utilizzano IP di dispositivi reali. I proxy datacenter sono più veloci ed economici, ma più facili da rilevare. I proxy mobile sono ideali per piattaforme con protezioni anti-bot avanzate.
cURL Nativo: Configurazione Base con Proxy
cURL è il modo più diretto per effettuare richieste HTTP con proxy in PHP. Le opzioni chiave sono CURLOPT_PROXY per l'hostname del proxy e CURLOPT_PROXYUSERPWD per le credenziali di autenticazione.
<?php
class CurlProxyClient
{
private string $proxyHost = 'gate.proxyhat.com';
private int $proxyPort = 8080;
private string $username;
private string $password;
public function __construct(string $username, string $password)
{
$this->username = $username;
$this->password = $password;
}
public function get(string $url, array $options = []): array
{
$ch = curl_init();
// Configurazione base
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_MAXREDIRS, 5);
curl_setopt($ch, CURLOPT_TIMEOUT, $options['timeout'] ?? 30);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $options['connect_timeout'] ?? 10);
// Configurazione proxy HTTP
curl_setopt($ch, CURLOPT_PROXY, $this->proxyHost);
curl_setopt($ch, CURLOPT_PROXYPORT, $this->proxyPort);
curl_setopt($ch, CURLOPT_PROXYTYPE, CURLPROXY_HTTP);
// Autenticazione proxy
$proxyAuth = "{$this->username}:{$this->password}";
curl_setopt($ch, CURLOPT_PROXYUSERPWD, $proxyAuth);
// TLS/SSL - Verifica certificato
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
// CA bundle personalizzato (opzionale)
if (isset($options['ca_bundle'])) {
curl_setopt($ch, CURLOPT_CAINFO, $options['ca_bundle']);
}
// Headers personalizzati
if (!empty($options['headers'])) {
curl_setopt($ch, CURLOPT_HTTPHEADER, $options['headers']);
}
// Esecuzione richiesta
$response = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
$error = curl_error($ch);
$errno = curl_errno($ch);
curl_close($ch);
return [
'status' => $httpCode,
'body' => $response,
'error' => $error,
'errno' => $errno,
'success' => $errno === 0 && $httpCode >= 200 && $httpCode < 300
];
}
public function post(string $url, array $data, array $options = []): array
{
$options['headers'] = $options['headers'] ?? [];
$options['headers'][] = 'Content-Type: application/json';
$options['body'] = json_encode($data);
return $this->request('POST', $url, $options);
}
private function request(string $method, string $url, array $options): array
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, $method);
if (!empty($options['body'])) {
curl_setopt($ch, CURLOPT_POSTFIELDS, $options['body']);
}
curl_setopt($ch, CURLOPT_PROXY, $this->proxyHost);
curl_setopt($ch, CURLOPT_PROXYPORT, $this->proxyPort);
curl_setopt($ch, CURLOPT_PROXYUSERPWD, "{$this->username}:{$this->password}");
if (!empty($options['headers'])) {
curl_setopt($ch, CURLOPT_HTTPHEADER, $options['headers']);
}
$response = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
$error = curl_error($ch);
curl_close($ch);
return [
'status' => $httpCode,
'body' => $response,
'error' => $error
];
}
}
// Esempio di utilizzo con geo-targeting
$client = new CurlProxyClient('user-country-US-session-abc123', 'password');
$response = $client->get('https://httpbin.org/ip', [
'timeout' => 20,
'headers' => [
'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Accept: application/json'
]
]);
if ($response['success']) {
echo "IP utilizzato: " . $response['body'];
} else {
echo "Errore: " . $response['error'];
}
Questo esempio mostra una classe wrapper completa che gestisce autenticazione, timeout, headers personalizzati e geo-targeting. Il formato del username user-country-US-session-abc123 permette di specificare paese e sessione per mantenere lo stesso IP attraverso richieste multiple.
Rotazione IP Per-Richiesta con cURL
Per evitare blocchi durante lo scraping intensivo, è fondamentale ruotare gli IP tra una richiesta e l'altra. Con i proxy residential che supportano sessioni sticky, puoi implementare una rotazione controllata.
<?php
class RotatingProxyClient
{
private string $proxyHost = 'gate.proxyhat.com';
private int $proxyPort = 8080;
private string $baseUsername;
private string $password;
private int $requestCount = 0;
public function __construct(string $baseUsername, string $password)
{
$this->baseUsername = $baseUsername;
$this->password = $password;
}
private function generateSessionId(): string
{
// Genera un ID sessione unico per ogni rotazione
return 'sess_' . bin2hex(random_bytes(8));
}
public function requestWithRotation(
string $url,
int $rotateEvery = 1,
?string $country = null
): array {
$this->requestCount++;
// Ruota l'IP ogni N richieste
$sessionId = $this->generateSessionId();
// Costruisci il username con geo-targeting e sessione
$username = $this->baseUsername;
if ($country !== null) {
$username .= "-country-{$country}";
}
$username .= "-session-{$sessionId}";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_PROXY, $this->proxyHost);
curl_setopt($ch, CURLOPT_PROXYPORT, $this->proxyPort);
curl_setopt($ch, CURLOPT_PROXYUSERPWD, "{$username}:{$this->password}");
curl_setopt($ch, CURLOPT_TIMEOUT, 30);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, true);
$response = curl_exec($ch);
$info = curl_getinfo($ch);
$error = curl_error($ch);
curl_close($ch);
return [
'status' => $info['http_code'],
'body' => $response,
'total_time' => $info['total_time'],
'connect_time' => $info['connect_time'],
'session_id' => $sessionId,
'error' => $error
];
}
public function scrapeMultiple(array $urls, ?string $country = null): array
{
$results = [];
foreach ($urls as $index => $url) {
$results[] = $this->requestWithRotation($url, 1, $country);
// Pausa tra richieste per evitare rate limiting
if ($index < count($urls) - 1) {
usleep(500000); // 500ms
}
}
return $results;
}
}
// Utilizzo con rotazione IP automatica
$client = new RotatingProxyClient('user', 'your_password');
$urls = [
'https://httpbin.org/ip',
'https://httpbin.org/headers',
'https://httpbin.org/user-agent'
];
$results = $client->scrapeMultiple($urls, 'DE'); // Geo-targeting Germania
foreach ($results as $result) {
echo "Sessione: {$result['session_id']}\n";
echo "Status: {$result['status']}\n";
echo "Tempo: {$result['total_time']}s\n\n";
}
Guzzle HTTP Client: Configurazione Proxy Avanzata
Guzzle è il client HTTP più popolare nell'ecosistema PHP. Offre un'interfaccia orientata agli oggetti, middleware per retry automatici e gestione delle eccezioni integrata.
<?php
require 'vendor/autoload.php';
use GuzzleHttp\Client;
use GuzzleHttp\HandlerStack;
use GuzzleHttp\Middleware;
use GuzzleHttp\RetryMiddleware;
use Psr\Http\Message\RequestInterface;
use Psr\Http\Message\ResponseInterface;
class GuzzleProxyClient
{
private Client $client;
private string $proxyHost = 'gate.proxyhat.com';
private int $proxyPort = 8080;
public function __construct(
string $username,
string $password,
array $config = []
) {
// Configurazione proxy come URL completo
$proxyUrl = "http://{$username}:{$password}@{$this->proxyHost}:{$this->proxyPort}";
// Handler stack per middleware personalizzati
$handlerStack = HandlerStack::create();
// Middleware per retry automatico con backoff esponenziale
$handlerStack->push(Middleware::retry(
function (int $retries, RequestInterface $request, ?ResponseInterface $response, ?\Exception $exception) use ($config) {
$maxRetries = $config['max_retries'] ?? 3;
// Retry su errori 5xx o eccezioni di rete
if ($retries >= $maxRetries) {
return false;
}
if ($exception instanceof \GuzzleHttp\Exception\ConnectException) {
return true;
}
if ($response && $response->getStatusCode() >= 500) {
return true;
}
return false;
},
function (int $retries) {
// Backoff esponenziale: 1s, 2s, 4s...
return 1000 * (2 ** $retries);
}
));
// Middleware per logging
$handlerStack->push(Middleware::log(
new \Monolog\Logger('proxy_client'),
new \Monolog\Formatter\LineFormatter('%message% %context%')
));
$this->client = new Client([
'proxy' => [
'http' => $proxyUrl,
'https' => $proxyUrl,
],
'timeout' => $config['timeout'] ?? 30,
'connect_timeout' => $config['connect_timeout'] ?? 10,
'handler' => $handlerStack,
'verify' => $config['verify_ssl'] ?? true,
'headers' => [
'User-Agent' => $config['user_agent'] ?? 'ProxyHat-PHP-Client/1.0',
'Accept' => 'application/json',
]
]);
}
public function get(string $url, array $options = []): array
{
try {
$response = $this->client->get($url, $options);
return [
'success' => true,
'status' => $response->getStatusCode(),
'body' => $response->getBody()->getContents(),
'headers' => $response->getHeaders()
];
} catch (\GuzzleHttp\Exception\RequestException $e) {
$response = $e->getResponse();
return [
'success' => false,
'status' => $response ? $response->getStatusCode() : 0,
'error' => $e->getMessage(),
'body' => $response ? $response->getBody()->getContents() : ''
];
}
}
public function post(string $url, array $data, array $options = []): array
{
try {
$options['json'] = $data;
$response = $this->client->post($url, $options);
return [
'success' => true,
'status' => $response->getStatusCode(),
'body' => $response->getBody()->getContents()
];
} catch (\GuzzleHttp\Exception\RequestException $e) {
return [
'success' => false,
'error' => $e->getMessage()
];
}
}
// Override proxy per singola richiesta (rotazione manuale)
public function requestWithProxy(
string $method,
string $url,
string $proxyUsername,
string $proxyPassword,
array $options = []
): array {
$proxyUrl = "http://{$proxyUsername}:{$proxyPassword}@{$this->proxyHost}:{$this->proxyPort}";
$options['proxy'] = [
'http' => $proxyUrl,
'https' => $proxyUrl
];
try {
$response = $this->client->request($method, $url, $options);
return [
'success' => true,
'status' => $response->getStatusCode(),
'body' => $response->getBody()->getContents()
];
} catch (\Exception $e) {
return [
'success' => false,
'error' => $e->getMessage()
];
}
}
}
// Esempio: Client con geo-targeting per gli USA
$client = new GuzzleProxyClient(
'user-country-US',
'your_password',
[
'timeout' => 45,
'max_retries' => 5,
'user_agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
]
);
// Richiesta GET con proxy
$result = $client->get('https://api.ipify.org?format=json');
if ($result['success']) {
$data = json_decode($result['body'], true);
echo "IP in uscita: " . $data['ip'];
}
// Rotazione IP per-richiesta
for ($i = 0; $i < 5; $i++) {
$sessionId = 'rot_' . bin2hex(random_bytes(4));
$result = $client->requestWithProxy(
'GET',
'https://httpbin.org/ip',
"user-country-DE-session-{$sessionId}",
'your_password'
);
echo "Richiesta {$i}: " . $result['body'] . "\n";
}
Pool di Proxy con Guzzle
Per applicazioni che necessitano di bilanciare il carico su più pool di proxy, puoi implementare un selettore round-robin o basato su pesi.
<?php
class ProxyPool
{
private array $proxies = [];
private int $currentIndex = 0;
public function __construct(array $proxyConfigs)
{
foreach ($proxyConfigs as $config) {
$this->proxies[] = [
'host' => $config['host'] ?? 'gate.proxyhat.com',
'port' => $config['port'] ?? 8080,
'username' => $config['username'],
'password' => $config['password'],
'weight' => $config['weight'] ?? 1,
'failures' => 0,
'last_used' => 0
];
}
}
public function getNext(): array
{
// Algoritmo weighted round-robin con failover
$totalWeight = array_sum(array_column($this->proxies, 'weight'));
$random = mt_rand(1, $totalWeight);
$current = 0;
foreach ($this->proxies as &$proxy) {
$current += $proxy['weight'];
if ($random <= $current && $proxy['failures'] < 3) {
$proxy['last_used'] = time();
return $proxy;
}
}
// Fallback al primo proxy disponibile
return reset($this->proxies);
}
public function markFailure(string $username): void
{
foreach ($this->proxies as &$proxy) {
if ($proxy['username'] === $username) {
$proxy['failures']++;
break;
}
}
}
public function markSuccess(string $username): void
{
foreach ($this->proxies as &$proxy) {
if ($proxy['username'] === $username) {
$proxy['failures'] = max(0, $proxy['failures'] - 1);
break;
}
}
}
public function getProxyUrl(array $proxy): string
{
return "http://{$proxy['username']}:{$proxy['password']}@{$proxy['host']}:{$proxy['port']}";
}
}
class PooledGuzzleClient
{
private ProxyPool $pool;
private Client $client;
public function __construct(ProxyPool $pool)
{
$this->pool = $pool;
$this->client = new Client([
'timeout' => 30,
'connect_timeout' => 10,
'verify' => true
]);
}
public function request(string $method, string $url, array $options = []): array
{
$maxAttempts = 3;
$lastError = null;
for ($attempt = 0; $attempt < $maxAttempts; $attempt++) {
$proxy = $this->pool->getNext();
$proxyUrl = $this->pool->getProxyUrl($proxy);
$options['proxy'] = [
'http' => $proxyUrl,
'https' => $proxyUrl
];
try {
$response = $this->client->request($method, $url, $options);
$this->pool->markSuccess($proxy['username']);
return [
'success' => true,
'status' => $response->getStatusCode(),
'body' => $response->getBody()->getContents()
];
} catch (\Exception $e) {
$this->pool->markFailure($proxy['username']);
$lastError = $e->getMessage();
// Pausa prima del retry
usleep(1000000 * ($attempt + 1));
}
}
return [
'success' => false,
'error' => $lastError
];
}
}
// Configurazione pool con diverse località
$pool = new ProxyPool([
['username' => 'user-country-US', 'password' => 'pass', 'weight' => 3],
['username' => 'user-country-DE', 'password' => 'pass', 'weight' => 2],
['username' => 'user-country-GB', 'password' => 'pass', 'weight' => 1],
]);
$client = new PooledGuzzleClient($pool);
// Le richieste vengono distribuite secondo i pesi
for ($i = 0; $i < 10; $i++) {
$result = $client->request('GET', 'https://httpbin.org/ip');
echo "Richiesta {$i}: " . ($result['success'] ? 'OK' : 'FAIL') . "\n";
}
Symfony HTTP Client: Richieste Asincrone e Proxy
Symfony HTTP Client offre un'API moderna con supporto nativo per richieste asincrone, multiplexing e gestione avanzata delle risposte. È particolarmente adatto per scraping ad alte prestazioni.
<?php
require 'vendor/autoload.php';
use Symfony\Component\HttpClient\HttpClient;
use Symfony\Component\HttpClient\HttpClientInterface;
use Symfony\Contracts\HttpClient\ResponseInterface;
class SymfonyProxyClient
{
private HttpClientInterface $client;
private string $proxyHost = 'gate.proxyhat.com';
private int $proxyPort = 8080;
public function __construct(
string $username,
string $password,
array $options = []
) {
$proxyUrl = "http://{$username}:{$password}@{$this->proxyHost}:{$this->proxyPort}";
$this->client = HttpClient::create([
'proxy' => $proxyUrl,
'timeout' => $options['timeout'] ?? 30,
'max_duration' => $options['max_duration'] ?? 60,
'verify_peer' => $options['verify_ssl'] ?? true,
'verify_host' => true,
'headers' => [
'User-Agent' => $options['user_agent'] ?? 'Symfony-Proxy-Client/1.0',
],
'max_redirects' => 5,
]);
}
// Richiesta sincrona
public function get(string $url, array $options = []): array
{
$response = $this->client->request('GET', $url, $options);
try {
$statusCode = $response->getStatusCode();
$content = $response->getContent();
return [
'success' => $statusCode >= 200 && $statusCode < 300,
'status' => $statusCode,
'body' => $content,
'headers' => $response->getHeaders()
];
} catch (\Symfony\Contracts\HttpClient\Exception\TransportExceptionInterface $e) {
return [
'success' => false,
'error' => $e->getMessage()
];
} catch (\Symfony\Contracts\HttpClient\Exception\HttpExceptionInterface $e) {
return [
'success' => false,
'status' => $e->getResponse()->getStatusCode(),
'error' => $e->getMessage()
];
}
}
// Richieste asincrone concorrenti
public function fetchConcurrent(array $urls, ?callable $onProgress = null): array
{
$responses = [];
// Avvia tutte le richieste in parallelo
foreach ($urls as $key => $url) {
$responses[$key] = $this->client->request('GET', $url);
}
$results = [];
// Itera sulle risposte man mano che arrivano
foreach ($this->client->stream($responses) as $response => $chunk) {
if ($chunk->isTimeout()) {
// Gestisci timeout
$key = array_search($response, $responses, true);
$results[$key] = [
'success' => false,
'error' => 'Timeout'
];
continue;
}
if ($chunk->isFirst()) {
// Headers ricevuti
if ($onProgress) {
$onProgress('headers', $response->getHeaders());
}
}
if ($chunk->isLast()) {
// Contenuto completo ricevuto
try {
$key = array_search($response, $responses, true);
$results[$key] = [
'success' => true,
'status' => $response->getStatusCode(),
'body' => $response->getContent(),
'headers' => $response->getHeaders()
];
} catch (\Exception $e) {
$key = array_search($response, $responses, true);
$results[$key] = [
'success' => false,
'error' => $e->getMessage()
];
}
}
}
return $results;
}
// Streaming per grandi risposte
public function streamContent(string $url, callable $onChunk): void
{
$response = $this->client->request('GET', $url);
foreach ($this->client->stream($response) as $chunk) {
if ($chunk->isTimeout()) {
continue;
}
$onChunk($chunk->getContent());
}
}
}
// Esempio: Scraping concorrente con Symfony
$client = new SymfonyProxyClient('user-country-US', 'your_password');
$urls = [
'page1' => 'https://httpbin.org/delay/1',
'page2' => 'https://httpbin.org/delay/2',
'page3' => 'https://httpbin.org/delay/1',
];
$start = microtime(true);
$results = $client->fetchConcurrent($urls, function ($event, $data) {
echo "Evento: {$event}\n";
});
$elapsed = microtime(true) - $start;
echo "Tempo totale: {$elapsed}s (vs " . count($urls) . " richieste seriali)\n";
foreach ($results as $key => $result) {
echo "{$key}: " . ($result['success'] ? 'OK' : 'FAIL') . "\n";
}
Integrazione Laravel: Service Class per Proxy Pool
In un'applicazione Laravel, è buona pratica incapsulare la logica dei proxy in un service class dedicato, registrato come singleton nel container. Questo permette di utilizzarlo facilmente da job, controller e altri service.
<?php
// app/Services/ResidentialProxyService.php
namespace App\Services;
use Illuminate\Support\Facades\Log;
use Illuminate\Support\Facades\Cache;
use GuzzleHttp\Client;
use GuzzleHttp\Exception\RequestException;
class ResidentialProxyService
{
private Client $client;
private string $proxyHost = 'gate.proxyhat.com';
private int $proxyPort = 8080;
private string $username;
private string $password;
private array $config;
public function __construct(array $config = [])
{
$this->config = array_merge([
'timeout' => 30,
'connect_timeout' => 10,
'max_retries' => 3,
'retry_delay' => 1000,
'verify_ssl' => true,
'default_country' => null,
], $config);
$this->username = config('proxy.username');
$this->password = config('proxy.password');
$this->initializeClient();
}
private function initializeClient(): void
{
$this->client = new Client([
'timeout' => $this->config['timeout'],
'connect_timeout' => $this->config['connect_timeout'],
'verify' => $this->config['verify_ssl'],
]);
}
private function buildProxyUrl(?string $country = null, ?string $sessionId = null): string
{
$username = $this->username;
if ($country) {
$username .= "-country-{$country}";
} elseif ($this->config['default_country']) {
$username .= "-country-{$this->config['default_country']}";
}
if ($sessionId) {
$username .= "-session-{$sessionId}";
}
return "http://{$username}:{$this->password}@{$this->proxyHost}:{$this->proxyPort}";
}
public function request(
string $method,
string $url,
array $options = [],
?string $country = null,
?string $sessionId = null
): array {
$proxyUrl = $this->buildProxyUrl($country, $sessionId);
$options['proxy'] = $proxyUrl;
$attempt = 0;
$lastException = null;
while ($attempt < $this->config['max_retries']) {
$attempt++;
try {
$response = $this->client->request($method, $url, $options);
$this->logSuccess($url, $attempt);
return [
'success' => true,
'status' => $response->getStatusCode(),
'body' => $response->getBody()->getContents(),
'headers' => $this->formatHeaders($response->getHeaders()),
];
} catch (RequestException $e) {
$lastException = $e;
$this->logFailure($url, $e, $attempt);
if ($this->shouldRetry($e)) {
usleep($this->config['retry_delay'] * 1000 * $attempt);
continue;
}
break;
}
}
return [
'success' => false,
'error' => $lastException?->getMessage(),
'status' => $lastException?->getResponse()?->getStatusCode(),
];
}
public function get(string $url, array $options = [], ?string $country = null): array
{
return $this->request('GET', $url, $options, $country);
}
public function post(string $url, array $data, array $options = [], ?string $country = null): array
{
$options['json'] = $data;
return $this->request('POST', $url, $options, $country);
}
// Sticky session: mantiene lo stesso IP per più richieste
public function createSession(?string $country = null): ProxySession
{
$sessionId = 'laravel_' . bin2hex(random_bytes(8));
Cache::put("proxy_session:{$sessionId}", [
'country' => $country ?? $this->config['default_country'],
'created_at' => now(),
'request_count' => 0,
], now()->addMinutes(30));
return new ProxySession($sessionId, $this, $country);
}
public function getSession(string $sessionId): ?array
{
return Cache::get("proxy_session:{$sessionId}");
}
public function incrementSessionCount(string $sessionId): void
{
$session = $this->getSession($sessionId);
if ($session) {
$session['request_count']++;
Cache::put("proxy_session:{$sessionId}", $session, now()->addMinutes(30));
}
}
private function shouldRetry(RequestException $e): bool
{
$statusCode = $e->getResponse()?->getStatusCode();
// Retry su errori server o rate limiting
return $statusCode === null || $statusCode >= 500 || $statusCode === 429;
}
private function logSuccess(string $url, int $attempts): void
{
if ($attempts > 1) {
Log::info("Proxy request succeeded after {$attempts} attempts", ['url' => $url]);
}
}
private function logFailure(string $url, \Exception $e, int $attempt): void
{
Log::warning("Proxy request failed (attempt {$attempt})", [
'url' => $url,
'error' => $e->getMessage(),
'status' => $e->getResponse()?->getStatusCode(),
]);
}
private function formatHeaders(array $headers): array
{
$formatted = [];
foreach ($headers as $name => $values) {
$formatted[$name] = implode(', ', $values);
}
return $formatted;
}
}
// app/Services/ProxySession.php
class ProxySession
{
private string $sessionId;
private ResidentialProxyService $service;
private ?string $country;
public function __construct(
string $sessionId,
ResidentialProxyService $service,
?string $country
) {
$this->sessionId = $sessionId;
$this->service = $service;
$this->country = $country;
}
public function request(string $method, string $url, array $options = []): array
{
$this->service->incrementSessionCount($this->sessionId);
return $this->service->request($method, $url, $options, $this->country, $this->sessionId);
}
public function get(string $url, array $options = []): array
{
return $this->request('GET', $url, $options);
}
public function getId(): string
{
return $this->sessionId;
}
}
// app/Providers/AppServiceProvider.php
namespace App\Providers;
use App\Services\ResidentialProxyService;
use Illuminate\Support\ServiceProvider;
class AppServiceProvider extends ServiceProvider
{
public function register(): void
{
$this->app->singleton(ResidentialProxyService::class, function ($app) {
return new ResidentialProxyService([
'timeout' => config('proxy.timeout', 30),
'max_retries' => config('proxy.max_retries', 3),
'default_country' => config('proxy.default_country'),
]);
});
}
}
// config/proxy.php
return [
'username' => env('PROXY_USERNAME', 'user'),
'password' => env('PROXY_PASSWORD', 'password'),
'timeout' => env('PROXY_TIMEOUT', 30),
'max_retries' => env('PROXY_MAX_RETRIES', 3),
'default_country' => env('PROXY_DEFAULT_COUNTRY', 'US'),
];
Utilizzo nei Laravel Jobs
<?php
// app/Jobs/ScrapeProductPrices.php
namespace App\Jobs;
use App\Services\ResidentialProxyService;
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;
use Illuminate\Support\Facades\Log;
class ScrapeProductPrices implements ShouldQueue
{
use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;
public int $tries = 3;
public int $backoff = 60;
private array $products;
private ?string $country;
public function __construct(array $products, ?string $country = null)
{
$this->products = $products;
$this->country = $country;
}
public function handle(ResidentialProxyService $proxy): void
{
// Crea una sessione sticky per mantenere lo stesso IP
$session = $proxy->createSession($this->country);
Log::info("Starting scrape job", [
'session_id' => $session->getId(),
'products_count' => count($this->products),
]);
foreach ($this->products as $product) {
$result = $session->get($product['url'], [
'headers' => [
'User-Agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Accept' => 'text/html,application/xhtml+xml',
],
]);
if ($result['success']) {
$price = $this->extractPrice($result['body'], $product['selector']);
$this->savePrice($product['id'], $price);
} else {
Log::warning("Failed to scrape product", [
'product_id' => $product['id'],
'error' => $result['error'],
]);
}
// Rate limiting tra richieste
usleep(200000); // 200ms
}
Log::info("Scrape job completed", ['session_id' => $session->getId()]);
}
private function extractPrice(string $html, string $selector): ?float
{
// Parsing HTML con DOMDocument o simplehtmldom
$dom = new \DOMDocument();
@$dom->loadHTML($html);
$xpath = new \DOMXPath($dom);
$nodes = $xpath->query($selector);
if ($nodes->length > 0) {
$text = trim($nodes->item(0)->textContent);
return (float) preg_replace('/[^0-9.]/', '', $text);
}
return null;
}
private function savePrice(int $productId, ?float $price): void
{
if ($price !== null) {
\App\Models\ProductPrice::updateOrCreate(
['product_id' => $productId],
['price' => $price, 'scraped_at' => now()]
);
}
}
}
// Dispatch del job
use App\Jobs\ScrapeProductPrices;
ScrapeProductPrices::dispatch(
[
['id' => 1, 'url' => 'https://example.com/product/1', 'selector' => '//span[@class="price"]'],
['id' => 2, 'url' => 'https://example.com/product/2', 'selector' => '//span[@class="price"]'],
],
'DE' // Geo-targeting Germania
)->onQueue('scraping');
Multi-cURL per Richieste Concorrenti
Per scraping ad alto throughput, curl_multi_* permette di eseguire decine di richieste in parallelo, riducendo drasticamente i tempi di esecuzione.
<?php
class ConcurrentCurlProxyClient
{
private string $proxyHost = 'gate.proxyhat.com';
private int $proxyPort = 8080;
private string $username;
private string $password;
private int $maxConcurrent;
public function __construct(
string $username,
string $password,
int $maxConcurrent = 10
) {
$this->username = $username;
$this->password = $password;
$this->maxConcurrent = $maxConcurrent;
}
/**
* Esegue richieste concorrenti con rotazione IP automatica
*
* @param array $urls Array di URL da processare
* @param array $options Opzioni per ogni richiesta
* @return array Risultati indicizzati per chiave dell'array originale
*/
public function fetchAll(array $urls, array $options = []): array
{
$results = [];
$handles = [];
$multiHandle = curl_multi_init();
// Configurazione comune
$commonOptions = [
CURLOPT_RETURNTRANSFER => true,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_MAXREDIRS => 5,
CURLOPT_TIMEOUT => $options['timeout'] ?? 30,
CURLOPT_CONNECTTIMEOUT => $options['connect_timeout'] ?? 10,
CURLOPT_SSL_VERIFYPEER => true,
CURLOPT_SSL_VERIFYHOST => 2,
CURLOPT_PROXY => $this->proxyHost,
CURLOPT_PROXYPORT => $this->proxyPort,
];
// Inizializza tutti i handle
foreach ($urls as $key => $url) {
$ch = curl_init($url);
// Genera sessione unica per ogni richiesta (rotazione IP)
$sessionId = 'multi_' . bin2hex(random_bytes(6));
$country = $options['country'] ?? null;
$username = $this->username;
if ($country) {
$username .= "-country-{$country}";
}
$username .= "-session-{$sessionId}";
$proxyAuth = "{$username}:{$this->password}";
$handleOptions = $commonOptions + [
CURLOPT_PROXYUSERPWD => $proxyAuth,
];
// Headers personalizzati
if (!empty($options['headers'])) {
$handleOptions[CURLOPT_HTTPHEADER] = $options['headers'];
}
curl_setopt_array($ch, $handleOptions);
$handles[$key] = $ch;
}
// Aggiungi tutti gli handle al multi handle
$active = null;
foreach ($handles as $ch) {
curl_multi_add_handle($multiHandle, $ch);
}
// Esegui le richieste
do {
$status = curl_multi_exec($multiHandle, $active);
if ($status === CURLM_CALL_MULTI_PERFORM) {
continue;
}
if ($status !== CURLM_OK) {
break;
}
// Attendi attività su almeno una connessione
curl_multi_select($multiHandle, 1.0);
} while ($active > 0);
// Raccogli i risultati
foreach ($handles as $key => $ch) {
$results[$key] = [
'status' => curl_getinfo($ch, CURLINFO_HTTP_CODE),
'body' => curl_multi_getcontent($ch),
'error' => curl_error($ch),
'total_time' => curl_getinfo($ch, CURLINFO_TOTAL_TIME),
'connect_time' => curl_getinfo($ch, CURLINFO_CONNECT_TIME),
'size_download' => curl_getinfo($ch, CURLINFO_SIZE_DOWNLOAD),
'success' => curl_errno($ch) === 0,
];
curl_multi_remove_handle($multiHandle, $ch);
curl_close($ch);
}
curl_multi_close($multiHandle);
return $results;
}
/**
* Processa URLs in batch con callback per ogni risultato
*/
public function processBatch(
array $urls,
callable $onSuccess,
?callable $onError = null,
array $options = []
): array {
$batchSize = $this->maxConcurrent;
$batches = array_chunk($urls, $batchSize, true);
$stats = [
'total' => count($urls),
'success' => 0,
'failed' => 0,
'total_time' => 0,
];
foreach ($batches as $batch) {
$results = $this->fetchAll($batch, $options);
foreach ($results as $key => $result) {
if ($result['success'] && $result['status'] >= 200 && $result['status'] < 300) {
$onSuccess($key, $result);
$stats['success']++;
} elseif ($onError) {
$onError($key, $result);
$stats['failed']++;
}
$stats['total_time'] += $result['total_time'];
}
}
return $stats;
}
}
// Esempio: Scraping concorrente di 50 URL
$client = new ConcurrentCurlProxyClient('user-country-US', 'your_password', 20);
$urls = [];
for ($i = 1; $i <= 50; $i++) {
$urls["page_{$i}"] = "https://httpbin.org/delay/1?page={$i}";
}
$start = microtime(true);
$stats = $client->processBatch(
$urls,
function ($key, $result) {
echo "OK: {$key} ({$result['total_time']}s)\n";
},
function ($key, $result) {
echo "FAIL: {$key} - {$result['error']}\n";
},
['country' => 'US', 'timeout' => 15]
);
$elapsed = microtime(true) - $start;
echo "\n=== Statistiche ===\n";
echo "Totale: {$stats['total']}\n";
echo "Successi: {$stats['success']}\n";
echo "Fallimenti: {$stats['failed']}\n";
echo "Tempo totale: {$elapsed}s\n";
echo "Tempo medio per richiesta: " . ($stats['total_time'] / $stats['total']) . "s\n";
TLS/SSL e Gestione CA Bundle
Quando si usano proxy HTTPS, è fondamentale configurare correttamente la verifica dei certificati per evitare attacchi man-in-the-middle. PHP e cURL necessitano di un CA bundle aggiornato.
<?php
class SecureProxyClient
{
private string $proxyHost = 'gate.proxyhat.com';
private int $proxyPort = 8080;
private ?string $caBundlePath;
private array $tlsOptions;
public function __construct(
string $username,
string $password,
array $tlsConfig = []
) {
$this->caBundlePath = $this->resolveCaBundle($tlsConfig['ca_bundle'] ?? null);
$this->tlsOptions = [
'verify_peer' => $tlsConfig['verify_peer'] ?? true,
'verify_peer_name' => $tlsConfig['verify_peer_name'] ?? true,
'verify_host' => $tlsConfig['verify_host'] ?? 2,
'ssl_version' => $tlsConfig['ssl_version'] ?? CURL_SSLVERSION_TLSv1_2,
];
}
/**
* Risolve il percorso del CA bundle
*/
private function resolveCaBundle(?string $customPath): ?string
{
if ($customPath && file_exists($customPath)) {
return $customPath;
}
// Percorsi comuni del CA bundle
$commonPaths = [
// Composer CA bundle (consigliato)
__DIR__ . '/vendor/composer/ca-bundle/res/cacert.pem',
// System paths
'/etc/ssl/certs/ca-certificates.crt', // Debian/Ubuntu
'/etc/pki/tls/certs/ca-bundle.crt', // RHEL/CentOS
'/usr/local/etc/openssl/cert.pem', // macOS Homebrew
'/usr/share/curl/curl-ca-bundle.crt', // Windows cURL
];
foreach ($commonPaths as $path) {
if (file_exists($path)) {
return $path;
}
}
// Fallback: scarica il bundle Mozilla
return $this->downloadCaBundle();
}
/**
* Scarica il CA bundle di Mozilla come fallback
*/
private function downloadCaBundle(): ?string
{
$cachePath = sys_get_temp_dir() . '/proxy_client_ca_bundle.pem';
if (file_exists($cachePath) && filemtime($cachePath) > strtotime('-1 week')) {
return $cachePath;
}
$bundle = file_get_contents('https://curl.se/ca/cacert.pem');
if ($bundle) {
file_put_contents($cachePath, $bundle);
return $cachePath;
}
return null;
}
/**
* Configura cURL con TLS sicuro
*/
private function configureTls($ch, string $proxyUsername, string $proxyPassword): void
{
// Configurazione proxy
curl_setopt($ch, CURLOPT_PROXY, $this->proxyHost);
curl_setopt($ch, CURLOPT_PROXYPORT, $this->proxyPort);
curl_setopt($ch, CURLOPT_PROXYUSERPWD, "{$proxyUsername}:{$proxyPassword}");
// TLS/SSL verification
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, $this->tlsOptions['verify_peer']);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, $this->tlsOptions['verify_host']);
curl_setopt($ch, CURLOPT_SSLVERSION, $this->tlsOptions['ssl_version']);
// CA bundle personalizzato
if ($this->caBundlePath) {
curl_setopt($ch, CURLOPT_CAINFO, $this->caBundlePath);
curl_setopt($ch, CURLOPT_CAPATH, dirname($this->caBundlePath));
}
// Certificato client (se richiesto)
// curl_setopt($ch, CURLOPT_SSLCERT, '/path/to/client.crt');
// curl_setopt($ch, CURLOPT_SSLKEY, '/path/to/client.key');
// Cipher suites (opzionale, per compatibilità)
// curl_setopt($ch, CURLOPT_SSL_CIPHER_LIST, 'TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256');
}
/**
* Verifica la connessione TLS
*/
public function testTlsConnection(string $url = 'https://www.google.com'): array
{
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
if ($this->caBundlePath) {
curl_setopt($ch, CURLOPT_CAINFO, $this->caBundlePath);
}
$result = curl_exec($ch);
$info = curl_getinfo($ch);
$error = curl_error($ch);
curl_close($ch);
return [
'success' => $error === '',
'ssl_verify_result' => $info['ssl_verify_result'],
'certinfo' => $info['certinfo'] ?? [],
'error' => $error,
'ca_bundle' => $this->caBundlePath,
];
}
/**
* Richiesta HTTPS attraverso proxy con verifica completa
*/
public function secureGet(
string $url,
string $proxyUsername,
string $proxyPassword,
array $options = []
): array {
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_TIMEOUT, $options['timeout'] ?? 30);
$this->configureTls($ch, $proxyUsername, $proxyPassword);
if (!empty($options['headers'])) {
curl_setopt($ch, CURLOPT_HTTPHEADER, $options['headers']);
}
$response = curl_exec($ch);
$info = curl_getinfo($ch);
$error = curl_error($ch);
curl_close($ch);
return [
'success' => $error === '' && $info['http_code'] >= 200 && $info['http_code'] < 300,
'status' => $info['http_code'],
'body' => $response,
'ssl_verify_result' => $info['ssl_verify_result'],
'error' => $error,
];
}
}
// Installazione del CA bundle via Composer
// composer require composer/ca-bundle
// Utilizzo
$client = new SecureProxyClient('user-country-US', 'your_password');
// Test connessione TLS
$test = $client->testTlsConnection();
if (!$test['success']) {
echo "Errore TLS: " . $test['error'] . "\n";
echo "CA Bundle: " . $test['ca_bundle'] . "\n";
}
// Richiesta sicura
$result = $client->secureGet(
'https://api.example.com/data',
'user-country-DE-session-abc123',
'your_password',
['timeout' => 20]
);
if ($result['success']) {
echo "Risposta ricevuta: " . strlen($result['body']) . " bytes\n";
} else {
echo "Errore: " . $result['error'] . "\n";
}
Confronto tra Client HTTP PHP con Proxy
| Caratteristica | cURL Nativo | Guzzle | Symfony HTTP | Laravel HTTP |
|---|---|---|---|---|
| Configurazione Proxy | CURLOPT_PROXY |
Array 'proxy' |
Opzione 'proxy' |
Metodo withProxy() |
| Richieste Asincrone | curl_multi_* | Promise/Middleware | Stream nativo | Promise (async) |
| Retry Automatico | Manuale | Middleware integrato | Manuale | Middleware |
| Logging | Manuale | Middleware PSR-3 | PSR-3 Logger | Integrato Log facade |
| Curve di Apprendimento | Bassa | Media | Media | Bassa |
| Prestazioni | Massime | Alte | Alte | Alte |
| Integrazione Laravel | Nessuna | Facade disponibile | Bridge disponibile | Nativa |
Punti Chiave da Ricordare
Configurazione Proxy: Usa sempre il formato
http://username:password@gate.proxyhat.com:8080per HTTP esocks5://username:password@gate.proxyhat.com:1080per SOCKS5. Il geo-targeting e le sessioni si configurano nel username, non nella password.
Rotazione IP: Per scraping intensivo, implementa rotazione per-request con ID sessione univoci. I proxy residential di ProxyHat permettono di mantenere sessioni sticky quando necessario.
TLS/SSL: Non disabilitare mai
verify_peerin produzione. Usa un CA bundle aggiornato tramitecomposer/ca-bundleo il bundle di sistema.
Concorrenza: Per scraping ad alto volume, usa
curl_multi_*o il sistema di stream di Symfony HTTP Client per massimizzare il throughput.
Error Handling: Implementa sempre retry con backoff esponenziale e circuit breaker per gestire fallimenti temporanei del proxy o del server target.
Conclusione
Configurare proxy HTTP in PHP richiede attenzione ai dettagli, ma con gli strumenti giusti — cURL nativo, Guzzle o Symfony HTTP Client — puoi costruire sistemi di scraping robusti e performanti. Per progetti Laravel, un service class dedicato come ResidentialProxyService centralizza la logica e semplifica l'utilizzo da job e controller.
Per iniziare con i proxy residential di ProxyHat, configura le tue credenziali nel file .env di Laravel o nel tuo script PHP, e utilizza il gateway gate.proxyhat.com:8080 per le tue richieste. Il geo-targeting e le sessioni sticky si configurano direttamente nel username, permettendo un controllo granulare senza modificare il codice.
Per casi d'uso più avanzati come web scraping su larga scala o SERP tracking, consulta la nostra pagina dei prezzi per scegliere il piano più adatto al tuo volume di richieste.






