Guía Completa de HTTP Proxy en Java: HttpClient, OkHttp, Jsoup y Apache HttpClient

Aprende a configurar y usar proxies HTTP en Java 17+ con ejemplos prácticos de HttpClient, OkHttp, Jsoup y Apache HttpClient. Incluye estrategias de rotación de IPs, paralelismo y configuración TLS.

Guía Completa de HTTP Proxy en Java: HttpClient, OkHttp, Jsoup y Apache HttpClient

Como desarrollador Java que construye scrapers, clientes API o herramientas de automatización, tarde o temprano necesitarás enrutar tus peticiones HTTP a través de un proxy. Ya sea para evitar rate limits, acceder a contenido georestringido, o simplemente para distribuir solicitudes entre múltiples IPs, dominar el uso de proxies en el ecosistema Java es esencial.

Esta guía cubre las principales bibliotecas del ecosistema Java moderno: Java 11+ HttpClient, OkHttp, Jsoup y Apache HttpClient. Cada sección incluye código listo para producción con manejo de errores, timeouts y autenticación.

Java 11+ HttpClient con ProxySelector

El cliente HTTP incorporado en Java 11+ (módulo java.net.http) ofrece una API moderna y reactiva. Para usar proxies, necesitas configurar un ProxySelector personalizado.

Configuración básica con ProxySelector

El enfoque más limpio es crear un HttpClient con un ProxySelector que dirija todo el tráfico a tu proxy:

import java.net.InetSocketAddress;
import java.net.Proxy;
import java.net.ProxySelector;
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.time.Duration;
import java.util.List;

public class JavaHttpClientProxy {

    private static final String PROXY_HOST = "gate.proxyhat.com";
    private static final int PROXY_PORT = 8080;

    public static void main(String[] args) throws Exception {
        // Crear ProxySelector que usa el proxy para todas las conexiones
        ProxySelector proxySelector = new ProxySelector() {
            @Override
            public List<Proxy> select(URI uri) {
                return List.of(new Proxy(Proxy.Type.HTTP, 
                    new InetSocketAddress(PROXY_HOST, PROXY_PORT)));
            }

            @Override
            public void connectFailed(URI uri, SocketAddress sa, IOException ioe) {
                System.err.println("Proxy connection failed: " + ioe.getMessage());
            }
        };

        HttpClient client = HttpClient.newBuilder()
            .proxy(proxySelector)
            .connectTimeout(Duration.ofSeconds(10))
            .followRedirects(HttpClient.Redirect.NORMAL)
            .build();

        HttpRequest request = HttpRequest.newBuilder()
            .uri(URI.create("https://httpbin.org/ip"))
            .timeout(Duration.ofSeconds(30))
            .GET()
            .build();

        HttpResponse<String> response = client.send(request, 
            HttpResponse.BodyHandlers.ofString());

        System.out.println("Status: " + response.statusCode());
        System.out.println("Body: " + response.body());
    }
}

Autenticación de proxy con Authenticator

Los proxies residenciales y de datacenter suelen requerir autenticación. Java HttpClient maneja esto mediante un Authenticator global:

import java.net.Authenticator;
import java.net.PasswordAuthentication;
import java.net.http.HttpClient;

public class JavaHttpClientAuthProxy {

    private static final String USERNAME = "user-country-US";
    private static final String PASSWORD = "tu_password";

    public static HttpClient createAuthenticatedClient() {
        Authenticator auth = new Authenticator() {
            @Override
        protected PasswordAuthentication getPasswordAuthentication() {
            // Verificar que es una solicitud de proxy (no del servidor destino)
            if (getRequestorType() == RequestorType.PROXY) {
                return new PasswordAuthentication(USERNAME, PASSWORD.toCharArray());
            }
            return null;
        }
    };

    ProxySelector proxySelector = ProxySelector.of(
        new InetSocketAddress("gate.proxyhat.com", 8080)
    );

    return HttpClient.newBuilder()
        .proxy(proxySelector)
        .authenticator(auth)
        .connectTimeout(Duration.ofSeconds(15))
        .build();
}

// Uso con geolocalización específica
public static HttpClient createGeoTargetedClient(String country, String city) {
    String username = String.format("user-country-%s-city-%s", country, city);
    
    Authenticator auth = new Authenticator() {
        @Override
        protected PasswordAuthentication getPasswordAuthentication() {
            if (getRequestorType() == RequestorType.PROXY) {
                return new PasswordAuthentication(username, PASSWORD.toCharArray());
            }
            return null;
        }
    };

    return HttpClient.newBuilder()
        .proxy(ProxySelector.of(new InetSocketAddress("gate.proxyhat.com", 8080)))
        .authenticator(auth)
        .connectTimeout(Duration.ofSeconds(20))
        .build();
}
}

Sesiones sticky con HttpClient

Para mantener la misma IP durante múltiples peticiones (útil para sesiones de usuario o flujos multi-paso), usa el flag de sesión en el username:

public class StickySessionClient {
    
    private final HttpClient client;
    private final String sessionId;

    public StickySessionClient(String sessionId, String country) {
        this.sessionId = sessionId;
        String username = String.format("user-country-%s-session-%s", country, sessionId);
        
        Authenticator auth = new Authenticator() {
            @Override
            protected PasswordAuthentication getPasswordAuthentication() {
                if (getRequestorType() == RequestorType.PROXY) {
                    return new PasswordAuthentication(username, "PASSWORD".toCharArray());
                }
                return null;
            }
        };

        this.client = HttpClient.newBuilder()
            .proxy(ProxySelector.of(new InetSocketAddress("gate.proxyhat.com", 8080)))
            .authenticator(auth)
            .connectTimeout(Duration.ofSeconds(15))
            .cookieHandler(new java.net.CookieManager())
            .build();
    }

    public HttpResponse<String> get(String url) throws Exception {
        HttpRequest request = HttpRequest.newBuilder()
            .uri(URI.create(url))
            .GET()
            .build();
        return client.send(request, HttpResponse.BodyHandlers.ofString());
    }
}

OkHttp con Proxy y Authenticator

OkHttp de Square es una de las bibliotecas HTTP más populares en Android y backend Java. Su API fluida hace que la configuración de proxies sea directa.

Configuración básica de OkHttp proxy

import okhttp3.Authenticator;
import okhttp3.Credentials;
import okhttp3.OkHttpClient;
import okhttp3.Request;
import okhttp3.Response;
import okhttp3.Route;
import java.net.InetSocketAddress;
import java.net.Proxy;
import java.time.Duration;
import java.util.concurrent.TimeUnit;

public class OkHttpProxyExample {

    private static final String PROXY_HOST = "gate.proxyhat.com";
    private static final int PROXY_PORT = 8080;
    private static final String USERNAME = "user-country-DE";
    private static final String PASSWORD = "tu_password";

    public static OkHttpClient createProxyClient() {
        Proxy proxy = new Proxy(Proxy.Type.HTTP, 
            new InetSocketAddress(PROXY_HOST, PROXY_PORT));

        Authenticator proxyAuthenticator = new Authenticator() {
            @Override
            public Request authenticate(Route route, Response response) {
                if (response.request().header("Proxy-Authorization") != null) {
                    return null; // Ya intentamos autenticar, falló
                }
                String credential = Credentials.basic(USERNAME, PASSWORD);
                return response.request().newBuilder()
                    .header("Proxy-Authorization", credential)
                    .build();
            }
        };

        return new OkHttpClient.Builder()
            .proxy(proxy)
            .proxyAuthenticator(proxyAuthenticator)
            .connectTimeout(15, TimeUnit.SECONDS)
            .readTimeout(30, TimeUnit.SECONDS)
            .writeTimeout(30, TimeUnit.SECONDS)
            .retryOnConnectionFailure(true)
            .build();
    }

    public static void main(String[] args) throws Exception {
        OkHttpClient client = createProxyClient();
        
        Request request = new Request.Builder()
            .url("https://httpbin.org/ip")
            .build();

        try (Response response = client.newCall(request).execute()) {
            System.out.println("Status: " + response.code());
            System.out.println("Body: " + response.body().string());
        }
    }
}

Connection pooling y optimización en OkHttp

OkHttp gestiona automáticamente un pool de conexiones. Para uso intensivo con proxies, ajusta estos parámetros:

import okhttp3.ConnectionPool;
import okhttp3.OkHttpClient;
import java.util.concurrent.TimeUnit;

public class OptimizedOkHttpProxy {

    public static OkHttpClient createOptimizedClient(String country) {
        Proxy proxy = new Proxy(Proxy.Type.HTTP, 
            new InetSocketAddress("gate.proxyhat.com", 8080));

        String username = "user-country-" + country;
        Authenticator proxyAuth = (route, response) -> {
            String credential = Credentials.basic(username, "PASSWORD");
            return response.request().newBuilder()
                .header("Proxy-Authorization", credential)
                .build();
        };

        // Connection pool: 50 conexiones idle, 5 minutos de keep-alive
        ConnectionPool connectionPool = new ConnectionPool(50, 5, TimeUnit.MINUTES);

        return new OkHttpClient.Builder()
            .proxy(proxy)
            .proxyAuthenticator(proxyAuth)
            .connectionPool(connectionPool)
            .connectTimeout(10, TimeUnit.SECONDS)
            .readTimeout(60, TimeUnit.SECONDS)
            .writeTimeout(30, TimeUnit.SECONDS)
            .retryOnConnectionFailure(true)
            // Interceptor para logging
            .addInterceptor(chain -> {
                long start = System.nanoTime();
                Request request = chain.request();
                Response response = chain.proceed(request);
                long elapsed = System.nanoTime() - start;
                System.out.printf("[%s] %s - %dms%n", 
                    request.method(), request.url(), elapsed / 1_000_000);
                return response;
            })
            .build();
    }
}

Jsoup con soporte de Proxy para HTML Parsing

Jsoup es la biblioteca estándar para parsing HTML en Java. Aunque su API principal es síncrona y simple, requiere configuración adicional para usar proxies.

Jsoup con proxy básico

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import java.net.Authenticator;
import java.net.PasswordAuthentication;
import java.net.Proxy;
import java.net.InetSocketAddress;

public class JsoupProxyExample {

    private static final String PROXY_HOST = "gate.proxyhat.com";
    private static final int PROXY_PORT = 8080;

    public static void main(String[] args) throws Exception {
        // Configurar autenticación global para el proxy
        Authenticator.setDefault(new Authenticator() {
            @Override
            protected PasswordAuthentication getPasswordAuthentication() {
                if (getRequestorType() == RequestorType.PROXY) {
                    return new PasswordAuthentication(
                        "user-country-FR", 
                        "PASSWORD".toCharArray()
                    );
                }
                return null;
            }
        });

        Proxy proxy = new Proxy(Proxy.Type.HTTP, 
            new InetSocketAddress(PROXY_HOST, PROXY_PORT));

        // Jsoup usa la conexión subyacente de Java
        Document doc = Jsoup.connect("https://example.com")
            .proxy(proxy)
            .userAgent("Mozilla/5.0 (compatible; ProxyHatBot/1.0)")
            .timeout(15000)
            .followRedirects(true)
            .get();

        System.out.println("Title: " + doc.title());
        doc.select("a[href]").forEach(link -> {
            System.out.println("Link: " + link.attr("href"));
        });
    }
}

Jsoup con wrapper reutilizable

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import java.net.*;
import java.util.Map;
import java.util.HashMap;

public class JsoupProxyClient {

    private final Proxy proxy;
    private final String username;
    private final String password;
    private final Map<String, String> defaultHeaders;

    public JsoupProxyClient(String country) {
        this.username = "user-country-" + country;
        this.password = "PASSWORD";
        this.proxy = new Proxy(Proxy.Type.HTTP, 
            new InetSocketAddress("gate.proxyhat.com", 8080));
        
        this.defaultHeaders = new HashMap<>();
        defaultHeaders.put("User-Agent", 
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36");
        defaultHeaders.put("Accept", "text/html,application/xhtml+xml");
        defaultHeaders.put("Accept-Language", "en-US,en;q=0.9");

        // Configurar autenticación global
        Authenticator.setDefault(new Authenticator() {
            @Override
            protected PasswordAuthentication getPasswordAuthentication() {
                if (getRequestorType() == RequestorType.PROXY) {
                    return new PasswordAuthentication(username, password.toCharArray());
                }
                return null;
            }
        });
    }

    public Document fetchDocument(String url) throws Exception {
        org.jsoup.Connection conn = Jsoup.connect(url)
            .proxy(proxy)
            .timeout(20000)
            .followRedirects(true)
            .ignoreHttpErrors(true)
            .maxBodySize(10 * 1024 * 1024); // 10MB max

        for (Map.Entry<String, String> header : defaultHeaders.entrySet()) {
            conn.header(header.getKey(), header.getValue());
        }

        return conn.get();
    }

    public Document fetchWithRetry(String url, int maxRetries) {
        Exception lastException = null;
        
        for (int i = 0; i < maxRetries; i++) {
            try {
                return fetchDocument(url);
            } catch (Exception e) {
                lastException = e;
                System.err.printf("Attempt %d failed: %s%n", i + 1, e.getMessage());
                
                // Backoff exponencial
                try {
                    Thread.sleep((long) Math.pow(2, i) * 1000);
                } catch (InterruptedException ie) {
                    Thread.currentThread().interrupt();
                    break;
                }
            }
        }
        throw new RuntimeException("All retries failed", lastException);
    }

    // Ejemplo de uso para scraping de productos
    public void scrapeProducts(String baseUrl) throws Exception {
        Document doc = fetchWithRetry(baseUrl, 3);
        
        for (Element product : doc.select(".product-item")) {
            String name = product.select(".product-name").text();
            String price = product.select(".price").text();
            String link = product.select("a").attr("abs:href");
            
            System.out.printf("Product: %s | Price: %s | Link: %s%n", 
                name, price, link);
        }
    }
}

Apache HttpClient (Ecosistema Legacy)

Apache HttpClient sigue siendo ampliamente usado en sistemas enterprise. Su configuración de proxy es más verbosa pero muy flexible.

import org.apache.hc.client5.http.auth.AuthScope;
import org.apache.hc.client5.http.auth.UsernamePasswordCredentials;
import org.apache.hc.client5.http.classic.methods.HttpGet;
import org.apache.hc.client5.http.config.RequestConfig;
import org.apache.hc.client5.http.impl.auth.BasicCredentialsProvider;
import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.HttpClients;
import org.apache.hc.core5.http.HttpHost;
import org.apache.hc.core5.http.io.entity.EntityUtils;
import org.apache.hc.core5.util.Timeout;

public class ApacheHttpClientProxy {

    public static CloseableHttpClient createProxyClient(String country) {
        HttpHost proxy = new HttpHost("gate.proxyhat.com", 8080);
        
        BasicCredentialsProvider credsProvider = new BasicCredentialsProvider();
        credsProvider.setCredentials(
            new AuthScope(proxy),
            new UsernamePasswordCredentials("user-country-" + country, "PASSWORD".toCharArray())
        );

        RequestConfig config = RequestConfig.custom()
            .setProxy(proxy)
            .setConnectTimeout(Timeout.ofSeconds(15))
            .setResponseTimeout(Timeout.ofSeconds(60))
            .build();

        return HttpClients.custom()
            .setDefaultCredentialsProvider(credsProvider)
            .setDefaultRequestConfig(config)
            .build();
    }

    public static void main(String[] args) throws Exception {
        try (CloseableHttpClient client = createProxyClient("US")) {
            HttpGet request = new HttpGet("https://httpbin.org/ip");
            
            client.execute(request, response -> {
                System.out.println("Status: " + response.getCode());
                System.out.println("Body: " + EntityUtils.toString(response.getEntity()));
                return null;
            });
        }
    }
}

Scraping Paralelo con ExecutorService y Pool de Proxies

Para scraping a escala, necesitas paralelismo con rotación de IPs. Este ejemplo usa un pool de proxies residenciales con ExecutorService:

import java.net.*;
import java.net.http.*;
import java.time.Duration;
import java.util.List;
import java.util.concurrent.*;
import java.util.concurrent.atomic.AtomicInteger;

public class ParallelProxyScraper {

    private static final String PROXY_HOST = "gate.proxyhat.com";
    private static final int PROXY_PORT = 8080;
    private static final String PASSWORD = "tu_password";
    
    // Pool de países para rotación
    private static final List<String> COUNTRIES = List.of(
        "US", "GB", "DE", "FR", "ES", "IT", "CA", "AU"
    );

    private final ExecutorService executor;
    private final AtomicInteger countryIndex = new AtomicInteger(0);

    public ParallelProxyScraper(int threadCount) {
        this.executor = Executors.newFixedThreadPool(threadCount);
    }

    private HttpClient createClientForCountry(String country) {
        String username = "user-country-" + country;
        
        Authenticator auth = new Authenticator() {
            @Override
            protected PasswordAuthentication getPasswordAuthentication() {
                if (getRequestorType() == RequestorType.PROXY) {
                    return new PasswordAuthentication(username, PASSWORD.toCharArray());
                }
                return null;
            }
        };

        return HttpClient.newBuilder()
            .proxy(ProxySelector.of(new InetSocketAddress(PROXY_HOST, PROXY_PORT)))
            .authenticator(auth)
            .connectTimeout(Duration.ofSeconds(15))
            .executor(executor) // Usar el mismo executor
            .build();
    }

    private String getNextCountry() {
        int idx = countryIndex.getAndIncrement() % COUNTRIES.size();
        return COUNTRIES.get(idx);
    }

    public CompletableFuture<String> fetchAsync(String url) {
        String country = getNextCountry();
        HttpClient client = createClientForCountry(country);
        
        HttpRequest request = HttpRequest.newBuilder()
            .uri(URI.create(url))
            .timeout(Duration.ofSeconds(30))
            .GET()
            .build();

        return client.sendAsync(request, HttpResponse.BodyHandlers.ofString())
            .thenApply(response -> {
                if (response.statusCode() == 200) {
                    return response.body();
                } else {
                    throw new RuntimeException("HTTP " + response.statusCode());
                }
            })
            .exceptionally(e -> {
                System.err.printf("[%s] Error: %s%n", country, e.getMessage());
                return null;
            });
    }

    public CompletableFuture<List<String>> scrapeUrls(List<String> urls) {
        List<CompletableFuture<String>> futures = urls.stream()
            .map(this::fetchAsync)
            .toList();

        return CompletableFuture.allOf(futures.toArray(new CompletableFuture[0]))
            .thenApply(v -> futures.stream()
                .map(CompletableFuture::join)
                .filter(Objects::nonNull)
                .toList());
    }

    public void shutdown() {
        executor.shutdown();
        try {
            if (!executor.awaitTermination(60, TimeUnit.SECONDS)) {
                executor.shutdownNow();
            }
        } catch (InterruptedException e) {
            executor.shutdownNow();
            Thread.currentThread().interrupt();
        }
    }

    public static void main(String[] args) throws Exception {
        ParallelProxyScraper scraper = new ParallelProxyScraper(10);
        
        List<String> urls = List.of(
            "https://httpbin.org/ip",
            "https://httpbin.org/headers",
            "https://httpbin.org/user-agent",
            "https://httpbin.org/status/200"
        );

        try {
            List<String> results = scraper.scrapeUrls(urls).join();
            System.out.printf("Fetched %d pages successfully%n", results.size());
            results.forEach(r -> System.out.println(r.substring(0, Math.min(100, r.length())) + "..."));
        } finally {
            scraper.shutdown();
        }
    }
}

Consideraciones TLS y SSLContext

Cuando trabajas con proxies, el handshake TLS puede requerir configuración especial, especialmente si el proxy realiza inspección SSL o si los servidores destino usan certificados no estándar.

SSLContext personalizado

import javax.net.ssl.SSLContext;
import javax.net.ssl.TrustManager;
import javax.net.ssl.X509TrustManager;
import java.net.http.HttpClient;
import java.security.cert.X509Certificate;

public class TlsProxyClient {

    // ADVERTENCIA: Solo para desarrollo/testing. En producción, usa certificados válidos.
    public static SSLContext createTrustAllContext() throws Exception {
        TrustManager[] trustAllCerts = new TrustManager[] {
            new X509TrustManager() {
                public X509Certificate[] getAcceptedIssuers() { return new X509Certificate[0]; }
                public void checkClientTrusted(X509Certificate[] certs, String t) { }
                public void checkServerTrusted(X509Certificate[] certs, String t) { }
            }
        };

        SSLContext sslContext = SSLContext.getInstance("TLS");
        sslContext.init(null, trustAllCerts, new java.security.SecureRandom());
        return sslContext;
    }

    public static HttpClient createTlsClient() throws Exception {
        Authenticator auth = new Authenticator() {
            @Override
            protected PasswordAuthentication getPasswordAuthentication() {
                if (getRequestorType() == RequestorType.PROXY) {
                    return new PasswordAuthentication(
                        "user-country-US", "PASSWORD".toCharArray());
                }
                return null;
            }
        };

        return HttpClient.newBuilder()
            .proxy(ProxySelector.of(new InetSocketAddress("gate.proxyhat.com", 8080)))
            .authenticator(auth)
            .sslContext(createTrustAllContext())
            .connectTimeout(Duration.ofSeconds(15))
            .build();
    }

    // Versión para OkHttp
    public static OkHttpClient createOkHttpTlsClient() throws Exception {
        SSLContext sslContext = createTrustAllContext();
        
        return new OkHttpClient.Builder()
            .proxy(new Proxy(Proxy.Type.HTTP, 
                new InetSocketAddress("gate.proxyhat.com", 8080)))
            .proxyAuthenticator((route, response) -> {
                String cred = Credentials.basic("user-country-US", "PASSWORD");
                return response.request().newBuilder()
                    .header("Proxy-Authorization", cred)
                    .build();
            })
            .sslSocketFactory(sslContext.getSocketFactory(), 
                (X509TrustManager) trustAllCerts[0])
            .hostnameVerifier((hostname, session) -> true) // Solo para testing
            .build();
    }
}

Comparación de Bibliotecas HTTP para Java

Característica Java 11+ HttpClient OkHttp Jsoup Apache HttpClient
Soporte HTTP/2 Nativo Depende de conexión Sí (v5)
API Reactiva Sí (CompletableFuture) Sí (Call) No Parcial
Connection Pool Automático Configurable No Configurable
Autenticación Proxy Authenticator ProxyAuthenticator Authenticator global CredentialsProvider
Parsing HTML No No No
Mantenimiento Activo (JDK) Activo Activo Activo
Mejor uso APIs modernas Android/Backend Web scraping Enterprise legacy

Estrategias de Retry y Circuit Breaker

Para scraping robusto, implementa políticas de retry con backoff exponencial y circuit breaker:

import java.time.Duration;
import java.util.concurrent.*;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.function.Supplier;

public class ResilientProxyClient {

    private final HttpClient baseClient;
    private final int maxRetries;
    private final Duration initialBackoff;
    private final Duration maxBackoff;
    
    // Circuit breaker state
    private final AtomicInteger failures = new AtomicInteger(0);
    private final AtomicInteger successes = new AtomicInteger(0);
    private volatile long lastFailureTime = 0;
    private static final int FAILURE_THRESHOLD = 5;
    private static final long RECOVERY_TIMEOUT_MS = 30000;

    public ResilientProxyClient(String country, int maxRetries) {
        this.maxRetries = maxRetries;
        this.initialBackoff = Duration.ofMillis(500);
        this.maxBackoff = Duration.ofSeconds(30);
        
        Authenticator auth = new Authenticator() {
            @Override
            protected PasswordAuthentication getPasswordAuthentication() {
                if (getRequestorType() == RequestorType.PROXY) {
                    return new PasswordAuthentication(
                        "user-country-" + country, "PASSWORD".toCharArray());
                }
                return null;
            }
        };

        this.baseClient = HttpClient.newBuilder()
            .proxy(ProxySelector.of(new InetSocketAddress("gate.proxyhat.com", 8080)))
            .authenticator(auth)
            .connectTimeout(Duration.ofSeconds(15))
            .build();
    }

    private boolean isCircuitOpen() {
        if (failures.get() < FAILURE_THRESHOLD) {
            return false;
        }
        // Check if recovery timeout has passed
        return System.currentTimeMillis() - lastFailureTime < RECOVERY_TIMEOUT_MS;
    }

    private void recordSuccess() {
        successes.incrementAndGet();
        failures.set(0); // Reset on success
    }

    private void recordFailure() {
        failures.incrementAndGet();
        lastFailureTime = System.currentTimeMillis();
    }

    public HttpResponse<String> executeWithRetry(HttpRequest request) throws Exception {
        if (isCircuitOpen()) {
            throw new RuntimeException("Circuit breaker is open - too many recent failures");
        }

        Exception lastException = null;
        
        for (int attempt = 0; attempt < maxRetries; attempt++) {
            try {
                HttpResponse<String> response = baseClient.send(
                    request, HttpResponse.BodyHandlers.ofString());
                
                if (response.statusCode() >= 500) {
                    throw new RuntimeException("Server error: " + response.statusCode());
                }
                
                if (response.statusCode() == 429) {
                    // Rate limited - wait longer
                    String retryAfter = response.headers()
                        .firstValue("Retry-After").orElse("60");
                    long waitSeconds = Long.parseLong(retryAfter);
                    Thread.sleep(waitSeconds * 1000);
                    continue;
                }
                
                recordSuccess();
                return response;
                
            } catch (Exception e) {
                lastException = e;
                recordFailure();
                
                // Exponential backoff
                long backoffMs = (long) (initialBackoff.toMillis() * Math.pow(2, attempt));
                backoffMs = Math.min(backoffMs, maxBackoff.toMillis());
                
                System.err.printf("Attempt %d failed, retrying in %dms: %s%n",
                    attempt + 1, backoffMs, e.getMessage());
                
                Thread.sleep(backoffMs);
            }
        }
        
        throw new RuntimeException("All retries exhausted", lastException);
    }

    public CompletableFuture<HttpResponse<String>> executeAsync(HttpRequest request) {
        return CompletableFuture.supplyAsync(() -> {
            try {
                return executeWithRetry(request);
            } catch (Exception e) {
                throw new CompletionException(e);
            }
        });
    }
}

Mejores Prácticas para Proxy en Java

  • Reutiliza clientes HTTP: Crear un nuevo HttpClient por petición desperdicia recursos. Usa singleton o pools.
  • Configura timeouts apropiados: Con proxies, las conexiones pueden ser más lentas. Aumenta timeouts a 15-30 segundos.
  • Implementa retry con backoff: Los proxies residenciales pueden tener IPs que fallan. Siempre ten retry logic.
  • Usa sticky sessions cuando sea necesario: Para flujos multi-paso (login, navegación), mantén la misma IP.
  • Rotación geográfica: Distribuye peticiones entre países para evitar patrones detectables.
  • Monitorea métricas: Registra tasa de éxito, latencia y códigos de respuesta por IP/país.
  • Maneja 429 gracefully: Respeta headers Retry-After y reduce velocidad cuando recibas rate limits.

Tip de ProxyHat: Para scraping de alta demanda, combina rotación de IPs con sesiones sticky de 10-15 minutos. Esto simula comportamiento de usuario real y reduce la probabilidad de bloqueos. Usa el parámetro user-country-X-session-Y en tu username para mantener consistencia.

Conclusiones

El ecosistema Java ofrece opciones robustas para trabajar con proxies HTTP. Java 11+ HttpClient es ideal para proyectos nuevos con su API reactiva nativa. OkHttp sigue siendo la opción más flexible con excelente soporte de connection pooling. Jsoup es indispensable cuando necesitas parsing HTML integrado. Y Apache HttpClient sigue siendo relevante en entornos enterprise legacy.

La clave del éxito en scraping a escala está en la combinación de: rotación inteligente de IPs, políticas de retry robustas, y monitoreo constante de métricas. Con los ejemplos de esta guía, tienes todo lo necesario para implementar un sistema de scraping resistente y eficiente.

Puntos Clave

  • Java 11+ HttpClient usa ProxySelector y Authenticator para configuración de proxies con autenticación.
  • OkHttp ofrece proxyAuthenticator para credenciales de proxy y ConnectionPool para optimización de conexiones.
  • Jsoup requiere configuración de Authenticator.setDefault() global para autenticación de proxy.
  • Siempre configura timeouts (connect, read, write) más generosos cuando uses proxies.
  • Implementa retry con backoff exponencial y circuit breaker para sistemas robustos.
  • Usa ExecutorService con CompletableFuture para scraping paralelo eficiente.
  • Las sesiones sticky (parámetro session en username) son esenciales para flujos multi-paso.

¿Listo para empezar?

Accede a más de 50M de IPs residenciales en más de 148 países con filtrado impulsado por IA.

Ver preciosProxies residenciales
← Volver al Blog