Java HTTP 代理完整指南:HttpClient、OkHttp、Jsoup 实战代码详解

深入讲解 Java 生态中的 HTTP 代理配置:Java 11+ HttpClient、OkHttp、Jsoup、Apache HttpClient 的代理认证、连接池、重试策略与 TLS 配置,附带完整可运行代码示例。

Java HTTP 代理完整指南:HttpClient、OkHttp、Jsoup 实战代码详解

Java 开发者在构建爬虫、API 客户端或自动化工具时,几乎总会遇到 IP 被限流、地理封锁或反爬检测的问题。HTTP 代理是解决这些问题的核心基础设施。本文以代码为先,覆盖 Java 生态中主流 HTTP 客户端的代理配置方式,从 Java 11+ 原生 HttpClient 到 OkHttp、Jsoup,再到生产级的连接池与并发抓取模式。

Java 11+ HttpClient:ProxySelector 与 Authenticator

Java 11 引入的现代 java.net.http.HttpClient 原生支持代理配置。核心机制是通过 ProxySelector 定义代理路由规则,通过 Authenticator 处理代理认证。这种设计将代理逻辑与请求逻辑解耦,适合需要动态切换代理的场景。

基础代理配置

最简单的方式是使用 ProxySelector.of() 创建固定代理选择器:

import java.net.InetSocketAddress;
import java.net.ProxySelector;
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;

public class JavaHttpClientProxyExample {

    public static void main(String[] args) throws Exception {
        // 创建代理选择器,指向 ProxyHat 网关
        ProxySelector proxySelector = ProxySelector.of(
            new InetSocketAddress("gate.proxyhat.com", 8080)
        );

        // 构建 HttpClient,配置代理
        HttpClient client = HttpClient.newBuilder()
            .proxy(proxySelector)
            .connectTimeout(java.time.Duration.ofSeconds(10))
            .build();

        HttpRequest request = HttpRequest.newBuilder()
            .uri(URI.create("https://httpbin.org/ip"))
            .GET()
            .build();

        HttpResponse<String> response = client.send(
            request,
            HttpResponse.BodyHandlers.ofString()
        );

        System.out.println("Status: " + response.statusCode());
        System.out.println("Body: " + response.body());
    }
}

上述代码通过代理网关发送请求,但未处理认证。大多数商业代理服务(包括 ProxyHat)要求用户名密码认证。

带认证的代理配置

使用 Authenticator 提供代理凭据:

import java.net.Authenticator;
import java.net.InetSocketAddress;
import java.net.PasswordAuthentication;
import java.net.ProxySelector;
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.time.Duration;

public class JavaHttpClientAuthProxy {

    public static void main(String[] args) throws Exception {
        // 代理认证器
        Authenticator proxyAuth = new Authenticator() {
            @Override
            protected PasswordAuthentication getPasswordAuthentication() {
                // ProxyHat 用户名格式支持地理定位
                // user-country-US 表示美国出口 IP
                return new PasswordAuthentication(
                    "user-country-US",
                    "PASSWORD".toCharArray()
                );
            }
        };

        ProxySelector proxySelector = ProxySelector.of(
            new InetSocketAddress("gate.proxyhat.com", 8080)
        );

        HttpClient client = HttpClient.newBuilder()
            .proxy(proxySelector)
            .authenticator(proxyAuth)
            .connectTimeout(Duration.ofSeconds(15))
            .followRedirects(HttpClient.Redirect.NORMAL)
            .build();

        HttpRequest request = HttpRequest.newBuilder()
            .uri(URI.create("https://httpbin.org/ip"))
            .timeout(Duration.ofSeconds(30))
            .header("User-Agent", "ProxyHat-Java-Client/1.0")
            .GET()
            .build();

        HttpResponse<String> response = client.send(
            request,
            HttpResponse.BodyHandlers.ofString()
        );

        System.out.println("Response: " + response.body());
    }
}

动态代理选择器

对于需要按请求切换代理的场景(如轮换 IP 池),实现自定义 ProxySelector

import java.net.InetSocketAddress;
import java.net.Proxy;
import java.net.ProxySelector;
import java.net.URI;
import java.util.List;
import java.util.concurrent.atomic.AtomicInteger;

public class RotatingProxySelector extends ProxySelector {

    private final List<String> proxyHosts;
    private final int port;
    private final AtomicInteger counter = new AtomicInteger(0);

    public RotatingProxySelector(List<String> proxyHosts, int port) {
        this.proxyHosts = proxyHosts;
        this.port = port;
    }

    @Override
    public List<Proxy> select(URI uri) {
        int idx = counter.getAndIncrement() % proxyHosts.size();
        String host = proxyHosts.get(idx);
        Proxy proxy = new Proxy(
            Proxy.Type.HTTP,
            new InetSocketAddress(host, port)
        );
        return List.of(proxy);
    }

    @Override
    public void connectFailed(URI uri, InetSocketAddress sa, IOException ioe) {
        System.err.println("Proxy connection failed: " + sa + " - " + ioe.getMessage());
    }
}

// 使用示例
RotatingProxySelector selector = new RotatingProxySelector(
    List.of("gate.proxyhat.com"),
    8080
);
HttpClient client = HttpClient.newBuilder()
    .proxy(selector)
    .build();

OkHttp:Proxy 与 Authenticator 配置

OkHttp 是 Android 和服务端广泛使用的 HTTP 客户端,其代理配置通过 OkHttpClient.Builderproxy()proxyAuthenticator() 方法完成。OkHttp 的优势在于内置连接池、透明的 GZIP 处理和响应缓存。

import okhttp3.Authenticator;
import okhttp3.Credentials;
import okhttp3.OkHttpClient;
import okhttp3.Request;
import okhttp3.Response;
import okhttp3.Route;

import java.io.IOException;
import java.net.InetSocketAddress;
import java.net.Proxy;
import java.time.Duration;
import java.util.concurrent.TimeUnit;

public class OkHttpProxyExample {

    public static void main(String[] args) throws IOException {
        // 定义代理
        Proxy proxy = new Proxy(
            Proxy.Type.HTTP,
            new InetSocketAddress("gate.proxyhat.com", 8080)
        );

        // 代理认证器
        Authenticator proxyAuthenticator = new Authenticator() {
            @Override
            public Request authenticate(Route route, Response response) throws IOException {
                if (response.responseCount() >= 3) {
                    return null; // 防止无限重试
                }
                String credential = Credentials.basic(
                    "user-country-DE-city-berlin", // 德国柏林出口
                    "PASSWORD"
                );
                return response.request().newBuilder()
                    .header("Proxy-Authorization", credential)
                    .build();
            }
        };

        // 构建客户端
        OkHttpClient client = new OkHttpClient.Builder()
            .proxy(proxy)
            .proxyAuthenticator(proxyAuthenticator)
            .connectTimeout(15, TimeUnit.SECONDS)
            .readTimeout(30, TimeUnit.SECONDS)
            .writeTimeout(30, TimeUnit.SECONDS)
            .retryOnConnectionFailure(true)
            .connectionPool(new okhttp3.ConnectionPool(10, 5, TimeUnit.MINUTES))
            .build();

        Request request = new Request.Builder()
            .url("https://httpbin.org/headers")
            .header("User-Agent", "ProxyHat-OkHttp/1.0")
            .build();

        try (Response response = client.newCall(request).execute()) {
            System.out.println("Status: " + response.code());
            System.out.println("Body: " + response.body().string());
        }
    }
}

OkHttp 连接池与重试策略

OkHttp 的 ConnectionPool 默认保持 5 个空闲连接,存活 5 分钟。对于高频代理请求,建议根据并发量调整:

ConnectionPool pool = new ConnectionPool(
    20,     // 最大空闲连接数
    2,      // 空闲连接保活时间
    TimeUnit.MINUTES
);

OkHttpClient client = new OkHttpClient.Builder()
    .connectionPool(pool)
    .retryOnConnectionFailure(true)
    .addInterceptor(new RetryInterceptor(3)) // 自定义重试拦截器
    .build();

自定义重试拦截器示例:

import okhttp3.Interceptor;
import okhttp3.Response;
import java.io.IOException;

public class RetryInterceptor implements Interceptor {
    private final int maxRetries;

    public RetryInterceptor(int maxRetries) {
        this.maxRetries = maxRetries;
    }

    @Override
    public Response intercept(Chain chain) throws IOException {
        Request request = chain.request();
        Response response = null;
        IOException lastException = null;

        for (int i = 0; i <= maxRetries; i++) {
            try {
                response = chain.proceed(request);
                if (response.isSuccessful()) {
                    return response;
                }
                // 4xx 错误不重试
                if (response.code() >= 400 && response.code() < 500) {
                    return response;
                }
                response.close();
            } catch (IOException e) {
                lastException = e;
                System.err.println("Attempt " + i + " failed: " + e.getMessage());
            }
        }

        throw lastException != null ? lastException
            : new IOException("Max retries exceeded");
    }
}

Jsoup:HTML 解析与代理支持

Jsoup 是 Java 生态中最流行的 HTML 解析库,广泛用于爬虫场景。Jsoup 的 Jsoup.connect() 方法支持代理配置,底层使用 HttpURLConnection。对于复杂场景,建议先用 OkHttp 或 HttpClient 获取 HTML,再交给 Jsoup 解析。

Jsoup 原生代理配置

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

import java.net.InetSocketAddress;
import java.net.Proxy;

public class JsoupProxyExample {

    public static void main(String[] args) throws Exception {
        Proxy proxy = new Proxy(
            Proxy.Type.HTTP,
            new InetSocketAddress("gate.proxyhat.com", 8080)
        );

        Document doc = Jsoup.connect("https://example.com")
            .proxy(proxy)
            .header("Proxy-Authorization", "Basic " +
                java.util.Base64.getEncoder().encodeToString(
                    "user-country-US:PASSWORD".getBytes()
                )
            )
            .userAgent("ProxyHat-Jsoup/1.0")
            .timeout(30000)
            .followRedirects(true)
            .get();

        System.out.println("Title: " + doc.title());
        System.out.println("Links: " + doc.select("a[href]").size());
    }
}

OkHttp + Jsoup 组合模式

对于需要更精细控制的场景(如自定义重试、连接池、拦截器),推荐 OkHttp 获取响应体,Jsoup 负责解析:

import okhttp3.OkHttpClient;
import okhttp3.Request;
import okhttp3.Response;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

public class OkHttpJsoupExample {

    private final OkHttpClient httpClient;

    public OkHttpJsoupExample() {
        Proxy proxy = new Proxy(
            Proxy.Type.HTTP,
            new InetSocketAddress("gate.proxyhat.com", 8080)
        );

        Authenticator proxyAuth = (route, response) -> {
            String cred = Credentials.basic("user-country-JP", "PASSWORD");
            return response.request().newBuilder()
                .header("Proxy-Authorization", cred)
                .build();
        };

        this.httpClient = new OkHttpClient.Builder()
            .proxy(proxy)
            .proxyAuthenticator(proxyAuth)
            .build();
    }

    public Document fetchAndParse(String url) throws Exception {
        Request request = new Request.Builder()
            .url(url)
            .build();

        try (Response response = httpClient.newCall(request).execute()) {
            if (!response.isSuccessful()) {
                throw new RuntimeException("HTTP " + response.code());
            }
            String html = response.body().string();
            return Jsoup.parse(html, url); // 设置 base URI
        }
    }

    public static void main(String[] args) throws Exception {
        OkHttpJsoupExample scraper = new OkHttpJsoupExample();
        Document doc = scraper.fetchAndParse("https://news.ycombinator.com");
        doc.select(".titleline > a").forEach(e ->
            System.out.println(e.text() + " -> " + e.attr("href"))
        );
    }
}

Apache HttpClient:遗留系统兼容

Apache HttpClient 4.x 仍在大量企业系统中使用。其代理配置通过 HttpHostCredentialsProvider 完成:

import org.apache.http.HttpHost;
import org.apache.http.auth.AuthScope;
import org.apache.http.auth.UsernamePasswordCredentials;
import org.apache.http.client.CredentialsProvider;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.BasicCredentialsProvider;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;

public class ApacheHttpClientProxy {

    public static void main(String[] args) throws Exception {
        HttpHost proxy = new HttpHost("gate.proxyhat.com", 8080);

        CredentialsProvider credsProvider = new BasicCredentialsProvider();
        credsProvider.setCredentials(
            new AuthScope(proxy),
            new UsernamePasswordCredentials("user-country-GB", "PASSWORD")
        );

        RequestConfig config = RequestConfig.custom()
            .setProxy(proxy)
            .setConnectTimeout(15000)
            .setSocketTimeout(30000)
            .build();

        try (CloseableHttpClient client = HttpClients.custom()
            .setDefaultCredentialsProvider(credsProvider)
            .setDefaultRequestConfig(config)
            .build()) {

            HttpGet request = new HttpGet("https://httpbin.org/ip");
            String body = client.execute(request, response -> {
                org.apache.http.util.EntityUtils.consume(response.getEntity());
                return "done";
            });
            System.out.println("Request completed");
        }
    }
}

并发抓取:ExecutorService + 住宅代理池

大规模数据抓取需要并发请求 + IP 轮换。以下示例使用虚拟线程(Java 21+)和住宅代理池:

import java.net.Authenticator;
import java.net.InetSocketAddress;
import java.net.PasswordAuthentication;
import java.net.ProxySelector;
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.time.Duration;
import java.util.List;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.Executors;
import java.util.concurrent.Semaphore;
import java.util.stream.Collectors;

public class ParallelScraper {

    private final HttpClient httpClient;
    private final Semaphore rateLimiter;

    public ParallelScraper(int maxConcurrent) {
        Authenticator proxyAuth = new Authenticator() {
            @Override
            protected PasswordAuthentication getPasswordAuthentication() {
                // 使用会话粘滞保持同一 IP
                String session = "session-" + Thread.currentThread().getId();
                return new PasswordAuthentication(
                    "user-session-" + session,
                    "PASSWORD".toCharArray()
                );
            }
        };

        ProxySelector proxy = ProxySelector.of(
            new InetSocketAddress("gate.proxyhat.com", 8080)
        );

        this.httpClient = HttpClient.newBuilder()
            .proxy(proxy)
            .authenticator(proxyAuth)
            .connectTimeout(Duration.ofSeconds(15))
            .executor(Executors.newVirtualThreadPerTaskExecutor())
            .build();

        this.rateLimiter = new Semaphore(maxConcurrent);
    }

    public CompletableFuture<String> fetchAsync(String url) {
        return CompletableFuture.supplyAsync(() -> {
            try {
                rateLimiter.acquire();
                try {
                    HttpRequest request = HttpRequest.newBuilder()
                        .uri(URI.create(url))
                        .timeout(Duration.ofSeconds(30))
                        .GET()
                        .build();

                    HttpResponse<String> response = httpClient.send(
                        request,
                        HttpResponse.BodyHandlers.ofString()
                    );
                    return response.body();
                } finally {
                    rateLimiter.release();
                }
            } catch (Exception e) {
                throw new RuntimeException("Failed: " + url, e);
            }
        });
    }

    public static void main(String[] args) {
        ParallelScraper scraper = new ParallelScraper(50);

        List<String> urls = List.of(
            "https://httpbin.org/ip",
            "https://httpbin.org/headers",
            "https://httpbin.org/user-agent"
        );

        List<CompletableFuture<String>> futures = urls.stream()
            .map(scraper::fetchAsync)
            .collect(Collectors.toList());

        CompletableFuture.allOf(futures.toArray(new CompletableFuture[0]))
            .join();

        futures.forEach(f -> System.out.println(f.join()));
    }
}

TLS/SSL 配置:JSSE 与自定义 SSLContext

当代理上游使用自签名证书或非标准 TLS 配置时,需要自定义 SSLContext

import javax.net.ssl.SSLContext;
import javax.net.ssl.TrustManager;
import javax.net.ssl.X509TrustManager;
import java.net.http.HttpClient;
import java.security.SecureRandom;
import java.security.cert.X509Certificate;

public class TlsConfig {

    // 警告:仅用于测试,生产环境应使用正规 CA
    public static SSLContext trustAllContext() throws Exception {
        TrustManager[] trustAll = new TrustManager[] {
            new X509TrustManager() {
                public X509Certificate[] getAcceptedIssuers() { return new X509Certificate[0]; }
                public void checkClientTrusted(X509Certificate[] certs, String t) {}
                public void checkServerTrusted(X509Certificate[] certs, String t) {}
            }
        };

        SSLContext ctx = SSLContext.getInstance("TLS");
        ctx.init(null, trustAll, new SecureRandom());
        return ctx;
    }

    public static HttpClient createTlsClient() throws Exception {
        return HttpClient.newBuilder()
            .sslContext(trustAllContext())
            .connectTimeout(java.time.Duration.ofSeconds(15))
            .build();
    }
}

对于 OkHttp,通过 OkHttpClient.Builder.sslSocketFactory() 配置:

OkHttpClient client = new OkHttpClient.Builder()
    .sslSocketFactory(
        trustAllContext().getSocketFactory(),
        (X509TrustManager) trustAllContext().getTrustManagers()[0]
    )
    .hostnameVerifier((hostname, session) -> true) // 仅测试用
    .build();

生产实践建议

  • 连接池大小:住宅代理延迟较高(200-800ms),连接池不宜过大。建议并发数 = 目标 QPS × 平均延迟(秒)× 1.5 冗余系数。
  • 超时配置:住宅代理建议 connect timeout 15-30s,read timeout 30-60s。数据中心代理可更激进。
  • 重试策略:区分可重试错误(5xx、超时、连接重置)与不可重试错误(4xx)。指数退避避免雪崩。
  • 会话粘滞:需要同一 IP 发送多请求时,使用 user-session-xxx 用户名格式保持会话。
  • 日志与监控:记录代理 IP、响应时间、错误率,用于后续优化和成本核算。

关键要点总结

核心要点

  • Java 11+ HttpClient 通过 ProxySelectorAuthenticator 原生支持代理,适合现代 Java 应用。
  • OkHttp 提供更丰富的拦截器机制和连接池配置,适合复杂场景。
  • Jsoup 原生代理支持有限,推荐与 OkHttp/HttpClient 组合使用。
  • 并发抓取使用虚拟线程 + Semaphore 限流,住宅代理需控制并发度。
  • TLS 自定义配置在代理上游证书非标准时必不可少,但生产环境慎用信任所有证书。

需要稳定、高成功率的住宅代理?ProxyHat 提供覆盖全球的住宅 IP 池,支持按国家和城市定位。访问 定价页面 了解详情,或查看 网页抓取用例 获取更多实践指南。

准备开始了吗?

通过AI过滤访问148多个国家的5000多万个住宅IP。

查看价格住宅代理
← 返回博客