Java 开发者在构建爬虫、API 客户端或自动化工具时,几乎总会遇到 IP 被限流、地理封锁或反爬检测的问题。HTTP 代理是解决这些问题的核心基础设施。本文以代码为先,覆盖 Java 生态中主流 HTTP 客户端的代理配置方式,从 Java 11+ 原生 HttpClient 到 OkHttp、Jsoup,再到生产级的连接池与并发抓取模式。
Java 11+ HttpClient:ProxySelector 与 Authenticator
Java 11 引入的现代 java.net.http.HttpClient 原生支持代理配置。核心机制是通过 ProxySelector 定义代理路由规则,通过 Authenticator 处理代理认证。这种设计将代理逻辑与请求逻辑解耦,适合需要动态切换代理的场景。
基础代理配置
最简单的方式是使用 ProxySelector.of() 创建固定代理选择器:
import java.net.InetSocketAddress;
import java.net.ProxySelector;
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
public class JavaHttpClientProxyExample {
public static void main(String[] args) throws Exception {
// 创建代理选择器,指向 ProxyHat 网关
ProxySelector proxySelector = ProxySelector.of(
new InetSocketAddress("gate.proxyhat.com", 8080)
);
// 构建 HttpClient,配置代理
HttpClient client = HttpClient.newBuilder()
.proxy(proxySelector)
.connectTimeout(java.time.Duration.ofSeconds(10))
.build();
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create("https://httpbin.org/ip"))
.GET()
.build();
HttpResponse<String> response = client.send(
request,
HttpResponse.BodyHandlers.ofString()
);
System.out.println("Status: " + response.statusCode());
System.out.println("Body: " + response.body());
}
}
上述代码通过代理网关发送请求,但未处理认证。大多数商业代理服务(包括 ProxyHat)要求用户名密码认证。
带认证的代理配置
使用 Authenticator 提供代理凭据:
import java.net.Authenticator;
import java.net.InetSocketAddress;
import java.net.PasswordAuthentication;
import java.net.ProxySelector;
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.time.Duration;
public class JavaHttpClientAuthProxy {
public static void main(String[] args) throws Exception {
// 代理认证器
Authenticator proxyAuth = new Authenticator() {
@Override
protected PasswordAuthentication getPasswordAuthentication() {
// ProxyHat 用户名格式支持地理定位
// user-country-US 表示美国出口 IP
return new PasswordAuthentication(
"user-country-US",
"PASSWORD".toCharArray()
);
}
};
ProxySelector proxySelector = ProxySelector.of(
new InetSocketAddress("gate.proxyhat.com", 8080)
);
HttpClient client = HttpClient.newBuilder()
.proxy(proxySelector)
.authenticator(proxyAuth)
.connectTimeout(Duration.ofSeconds(15))
.followRedirects(HttpClient.Redirect.NORMAL)
.build();
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create("https://httpbin.org/ip"))
.timeout(Duration.ofSeconds(30))
.header("User-Agent", "ProxyHat-Java-Client/1.0")
.GET()
.build();
HttpResponse<String> response = client.send(
request,
HttpResponse.BodyHandlers.ofString()
);
System.out.println("Response: " + response.body());
}
}
动态代理选择器
对于需要按请求切换代理的场景(如轮换 IP 池),实现自定义 ProxySelector:
import java.net.InetSocketAddress;
import java.net.Proxy;
import java.net.ProxySelector;
import java.net.URI;
import java.util.List;
import java.util.concurrent.atomic.AtomicInteger;
public class RotatingProxySelector extends ProxySelector {
private final List<String> proxyHosts;
private final int port;
private final AtomicInteger counter = new AtomicInteger(0);
public RotatingProxySelector(List<String> proxyHosts, int port) {
this.proxyHosts = proxyHosts;
this.port = port;
}
@Override
public List<Proxy> select(URI uri) {
int idx = counter.getAndIncrement() % proxyHosts.size();
String host = proxyHosts.get(idx);
Proxy proxy = new Proxy(
Proxy.Type.HTTP,
new InetSocketAddress(host, port)
);
return List.of(proxy);
}
@Override
public void connectFailed(URI uri, InetSocketAddress sa, IOException ioe) {
System.err.println("Proxy connection failed: " + sa + " - " + ioe.getMessage());
}
}
// 使用示例
RotatingProxySelector selector = new RotatingProxySelector(
List.of("gate.proxyhat.com"),
8080
);
HttpClient client = HttpClient.newBuilder()
.proxy(selector)
.build();
OkHttp:Proxy 与 Authenticator 配置
OkHttp 是 Android 和服务端广泛使用的 HTTP 客户端,其代理配置通过 OkHttpClient.Builder 的 proxy() 和 proxyAuthenticator() 方法完成。OkHttp 的优势在于内置连接池、透明的 GZIP 处理和响应缓存。
import okhttp3.Authenticator;
import okhttp3.Credentials;
import okhttp3.OkHttpClient;
import okhttp3.Request;
import okhttp3.Response;
import okhttp3.Route;
import java.io.IOException;
import java.net.InetSocketAddress;
import java.net.Proxy;
import java.time.Duration;
import java.util.concurrent.TimeUnit;
public class OkHttpProxyExample {
public static void main(String[] args) throws IOException {
// 定义代理
Proxy proxy = new Proxy(
Proxy.Type.HTTP,
new InetSocketAddress("gate.proxyhat.com", 8080)
);
// 代理认证器
Authenticator proxyAuthenticator = new Authenticator() {
@Override
public Request authenticate(Route route, Response response) throws IOException {
if (response.responseCount() >= 3) {
return null; // 防止无限重试
}
String credential = Credentials.basic(
"user-country-DE-city-berlin", // 德国柏林出口
"PASSWORD"
);
return response.request().newBuilder()
.header("Proxy-Authorization", credential)
.build();
}
};
// 构建客户端
OkHttpClient client = new OkHttpClient.Builder()
.proxy(proxy)
.proxyAuthenticator(proxyAuthenticator)
.connectTimeout(15, TimeUnit.SECONDS)
.readTimeout(30, TimeUnit.SECONDS)
.writeTimeout(30, TimeUnit.SECONDS)
.retryOnConnectionFailure(true)
.connectionPool(new okhttp3.ConnectionPool(10, 5, TimeUnit.MINUTES))
.build();
Request request = new Request.Builder()
.url("https://httpbin.org/headers")
.header("User-Agent", "ProxyHat-OkHttp/1.0")
.build();
try (Response response = client.newCall(request).execute()) {
System.out.println("Status: " + response.code());
System.out.println("Body: " + response.body().string());
}
}
}
OkHttp 连接池与重试策略
OkHttp 的 ConnectionPool 默认保持 5 个空闲连接,存活 5 分钟。对于高频代理请求,建议根据并发量调整:
ConnectionPool pool = new ConnectionPool(
20, // 最大空闲连接数
2, // 空闲连接保活时间
TimeUnit.MINUTES
);
OkHttpClient client = new OkHttpClient.Builder()
.connectionPool(pool)
.retryOnConnectionFailure(true)
.addInterceptor(new RetryInterceptor(3)) // 自定义重试拦截器
.build();
自定义重试拦截器示例:
import okhttp3.Interceptor;
import okhttp3.Response;
import java.io.IOException;
public class RetryInterceptor implements Interceptor {
private final int maxRetries;
public RetryInterceptor(int maxRetries) {
this.maxRetries = maxRetries;
}
@Override
public Response intercept(Chain chain) throws IOException {
Request request = chain.request();
Response response = null;
IOException lastException = null;
for (int i = 0; i <= maxRetries; i++) {
try {
response = chain.proceed(request);
if (response.isSuccessful()) {
return response;
}
// 4xx 错误不重试
if (response.code() >= 400 && response.code() < 500) {
return response;
}
response.close();
} catch (IOException e) {
lastException = e;
System.err.println("Attempt " + i + " failed: " + e.getMessage());
}
}
throw lastException != null ? lastException
: new IOException("Max retries exceeded");
}
}
Jsoup:HTML 解析与代理支持
Jsoup 是 Java 生态中最流行的 HTML 解析库,广泛用于爬虫场景。Jsoup 的 Jsoup.connect() 方法支持代理配置,底层使用 HttpURLConnection。对于复杂场景,建议先用 OkHttp 或 HttpClient 获取 HTML,再交给 Jsoup 解析。
Jsoup 原生代理配置
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import java.net.InetSocketAddress;
import java.net.Proxy;
public class JsoupProxyExample {
public static void main(String[] args) throws Exception {
Proxy proxy = new Proxy(
Proxy.Type.HTTP,
new InetSocketAddress("gate.proxyhat.com", 8080)
);
Document doc = Jsoup.connect("https://example.com")
.proxy(proxy)
.header("Proxy-Authorization", "Basic " +
java.util.Base64.getEncoder().encodeToString(
"user-country-US:PASSWORD".getBytes()
)
)
.userAgent("ProxyHat-Jsoup/1.0")
.timeout(30000)
.followRedirects(true)
.get();
System.out.println("Title: " + doc.title());
System.out.println("Links: " + doc.select("a[href]").size());
}
}
OkHttp + Jsoup 组合模式
对于需要更精细控制的场景(如自定义重试、连接池、拦截器),推荐 OkHttp 获取响应体,Jsoup 负责解析:
import okhttp3.OkHttpClient;
import okhttp3.Request;
import okhttp3.Response;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
public class OkHttpJsoupExample {
private final OkHttpClient httpClient;
public OkHttpJsoupExample() {
Proxy proxy = new Proxy(
Proxy.Type.HTTP,
new InetSocketAddress("gate.proxyhat.com", 8080)
);
Authenticator proxyAuth = (route, response) -> {
String cred = Credentials.basic("user-country-JP", "PASSWORD");
return response.request().newBuilder()
.header("Proxy-Authorization", cred)
.build();
};
this.httpClient = new OkHttpClient.Builder()
.proxy(proxy)
.proxyAuthenticator(proxyAuth)
.build();
}
public Document fetchAndParse(String url) throws Exception {
Request request = new Request.Builder()
.url(url)
.build();
try (Response response = httpClient.newCall(request).execute()) {
if (!response.isSuccessful()) {
throw new RuntimeException("HTTP " + response.code());
}
String html = response.body().string();
return Jsoup.parse(html, url); // 设置 base URI
}
}
public static void main(String[] args) throws Exception {
OkHttpJsoupExample scraper = new OkHttpJsoupExample();
Document doc = scraper.fetchAndParse("https://news.ycombinator.com");
doc.select(".titleline > a").forEach(e ->
System.out.println(e.text() + " -> " + e.attr("href"))
);
}
}
Apache HttpClient:遗留系统兼容
Apache HttpClient 4.x 仍在大量企业系统中使用。其代理配置通过 HttpHost 和 CredentialsProvider 完成:
import org.apache.http.HttpHost;
import org.apache.http.auth.AuthScope;
import org.apache.http.auth.UsernamePasswordCredentials;
import org.apache.http.client.CredentialsProvider;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.BasicCredentialsProvider;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
public class ApacheHttpClientProxy {
public static void main(String[] args) throws Exception {
HttpHost proxy = new HttpHost("gate.proxyhat.com", 8080);
CredentialsProvider credsProvider = new BasicCredentialsProvider();
credsProvider.setCredentials(
new AuthScope(proxy),
new UsernamePasswordCredentials("user-country-GB", "PASSWORD")
);
RequestConfig config = RequestConfig.custom()
.setProxy(proxy)
.setConnectTimeout(15000)
.setSocketTimeout(30000)
.build();
try (CloseableHttpClient client = HttpClients.custom()
.setDefaultCredentialsProvider(credsProvider)
.setDefaultRequestConfig(config)
.build()) {
HttpGet request = new HttpGet("https://httpbin.org/ip");
String body = client.execute(request, response -> {
org.apache.http.util.EntityUtils.consume(response.getEntity());
return "done";
});
System.out.println("Request completed");
}
}
}
并发抓取:ExecutorService + 住宅代理池
大规模数据抓取需要并发请求 + IP 轮换。以下示例使用虚拟线程(Java 21+)和住宅代理池:
import java.net.Authenticator;
import java.net.InetSocketAddress;
import java.net.PasswordAuthentication;
import java.net.ProxySelector;
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.time.Duration;
import java.util.List;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.Executors;
import java.util.concurrent.Semaphore;
import java.util.stream.Collectors;
public class ParallelScraper {
private final HttpClient httpClient;
private final Semaphore rateLimiter;
public ParallelScraper(int maxConcurrent) {
Authenticator proxyAuth = new Authenticator() {
@Override
protected PasswordAuthentication getPasswordAuthentication() {
// 使用会话粘滞保持同一 IP
String session = "session-" + Thread.currentThread().getId();
return new PasswordAuthentication(
"user-session-" + session,
"PASSWORD".toCharArray()
);
}
};
ProxySelector proxy = ProxySelector.of(
new InetSocketAddress("gate.proxyhat.com", 8080)
);
this.httpClient = HttpClient.newBuilder()
.proxy(proxy)
.authenticator(proxyAuth)
.connectTimeout(Duration.ofSeconds(15))
.executor(Executors.newVirtualThreadPerTaskExecutor())
.build();
this.rateLimiter = new Semaphore(maxConcurrent);
}
public CompletableFuture<String> fetchAsync(String url) {
return CompletableFuture.supplyAsync(() -> {
try {
rateLimiter.acquire();
try {
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create(url))
.timeout(Duration.ofSeconds(30))
.GET()
.build();
HttpResponse<String> response = httpClient.send(
request,
HttpResponse.BodyHandlers.ofString()
);
return response.body();
} finally {
rateLimiter.release();
}
} catch (Exception e) {
throw new RuntimeException("Failed: " + url, e);
}
});
}
public static void main(String[] args) {
ParallelScraper scraper = new ParallelScraper(50);
List<String> urls = List.of(
"https://httpbin.org/ip",
"https://httpbin.org/headers",
"https://httpbin.org/user-agent"
);
List<CompletableFuture<String>> futures = urls.stream()
.map(scraper::fetchAsync)
.collect(Collectors.toList());
CompletableFuture.allOf(futures.toArray(new CompletableFuture[0]))
.join();
futures.forEach(f -> System.out.println(f.join()));
}
}
TLS/SSL 配置:JSSE 与自定义 SSLContext
当代理上游使用自签名证书或非标准 TLS 配置时,需要自定义 SSLContext:
import javax.net.ssl.SSLContext;
import javax.net.ssl.TrustManager;
import javax.net.ssl.X509TrustManager;
import java.net.http.HttpClient;
import java.security.SecureRandom;
import java.security.cert.X509Certificate;
public class TlsConfig {
// 警告:仅用于测试,生产环境应使用正规 CA
public static SSLContext trustAllContext() throws Exception {
TrustManager[] trustAll = new TrustManager[] {
new X509TrustManager() {
public X509Certificate[] getAcceptedIssuers() { return new X509Certificate[0]; }
public void checkClientTrusted(X509Certificate[] certs, String t) {}
public void checkServerTrusted(X509Certificate[] certs, String t) {}
}
};
SSLContext ctx = SSLContext.getInstance("TLS");
ctx.init(null, trustAll, new SecureRandom());
return ctx;
}
public static HttpClient createTlsClient() throws Exception {
return HttpClient.newBuilder()
.sslContext(trustAllContext())
.connectTimeout(java.time.Duration.ofSeconds(15))
.build();
}
}
对于 OkHttp,通过 OkHttpClient.Builder.sslSocketFactory() 配置:
OkHttpClient client = new OkHttpClient.Builder()
.sslSocketFactory(
trustAllContext().getSocketFactory(),
(X509TrustManager) trustAllContext().getTrustManagers()[0]
)
.hostnameVerifier((hostname, session) -> true) // 仅测试用
.build();
生产实践建议
- 连接池大小:住宅代理延迟较高(200-800ms),连接池不宜过大。建议并发数 = 目标 QPS × 平均延迟(秒)× 1.5 冗余系数。
- 超时配置:住宅代理建议 connect timeout 15-30s,read timeout 30-60s。数据中心代理可更激进。
- 重试策略:区分可重试错误(5xx、超时、连接重置)与不可重试错误(4xx)。指数退避避免雪崩。
- 会话粘滞:需要同一 IP 发送多请求时,使用
user-session-xxx用户名格式保持会话。 - 日志与监控:记录代理 IP、响应时间、错误率,用于后续优化和成本核算。
关键要点总结
核心要点
- Java 11+ HttpClient 通过
ProxySelector和Authenticator原生支持代理,适合现代 Java 应用。- OkHttp 提供更丰富的拦截器机制和连接池配置,适合复杂场景。
- Jsoup 原生代理支持有限,推荐与 OkHttp/HttpClient 组合使用。
- 并发抓取使用虚拟线程 + Semaphore 限流,住宅代理需控制并发度。
- TLS 自定义配置在代理上游证书非标准时必不可少,但生产环境慎用信任所有证书。
需要稳定、高成功率的住宅代理?ProxyHat 提供覆盖全球的住宅 IP 池,支持按国家和城市定位。访问 定价页面 了解详情,或查看 网页抓取用例 获取更多实践指南。






