Ruby HTTP 代理完整指南:Net::HTTP、Typhoeus 与 ProxyHat SDK 实战

深入讲解 Ruby 中使用 HTTP 代理的完整方案,涵盖 Net::HTTP 标准库、Typhoeus 并发请求、ProxyHat SDK 轮换与地理定位,以及 Rails 集成技巧。附带生产级代码示例。

Ruby HTTP 代理完整指南:Net::HTTP、Typhoeus 与 ProxyHat SDK 实战

引言:为什么 Ruby 开发者需要代理知识

构建数据管道、价格监控系统或 SERP 抓取工具时,代理服务器不再是可选项——而是必需品。单 IP 高频请求会触发速率限制、CAPTCHA 验证,甚至 IP 封禁。Ruby 生态提供了多种代理集成方案,从标准库 Net::HTTP 到高性能的 Typhoeus,再到专业代理服务的 SDK。

本指南将从底层开始,逐步构建一个生产级的代理感知 HTTP 客户端,涵盖认证、错误处理、并发请求、TLS 配置以及 Rails 集成。

Net::HTTP:标准库代理基础

Net::HTTP 是 Ruby 标准库的一部分,无需额外依赖。它原生支持 HTTP 代理,但配置方式需要理解其代理构造器的工作原理。

基本代理配置

以下代码展示如何通过 ProxyHat 住宅代理发送请求:

require 'net/http'
require 'uri'

# ProxyHat 连接参数
PROXY_HOST = 'gate.proxyhat.com'
PROXY_PORT = 8080
PROXY_USER = 'your_username'
PROXY_PASS = 'your_password'

def fetch_with_proxy(url, proxy_user: PROXY_USER, proxy_pass: PROXY_PASS)
  uri = URI.parse(url)
  
  # 创建代理连接
  proxy = Net::HTTP::Proxy(PROXY_HOST, PROXY_PORT, proxy_user, proxy_pass)
  http = proxy.new(uri.host, uri.port)
  
  # 配置超时和 TLS
  http.use_ssl = (uri.scheme == 'https')
  http.open_timeout = 15
  http.read_timeout = 30
  http.verify_mode = OpenSSL::SSL::VERIFY_PEER
  
  request = Net::HTTP::Get.new(uri.request_uri)
  request['User-Agent'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
  request['Accept'] = 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'
  
  response = http.request(request)
  
  {
    status: response.code.to_i,
    headers: response.each_header.to_h,
    body: response.body
  }
rescue Net::OpenTimeout => e
  { error: 'connection_timeout', message: e.message }
rescue Net::ReadTimeout => e
  { error: 'read_timeout', message: e.message }
rescue Net::HTTPBadResponse => e
  { error: 'invalid_response', message: e.message }
rescue Errno::ECONNREFUSED => e
  { error: 'connection_refused', message: e.message }
rescue SocketError => e
  { error: 'dns_resolution_failed', message: e.message }
end

# 使用示例
result = fetch_with_proxy('https://httpbin.org/ip')
puts result[:body] if result[:status] == 200

Net::HTTP::Proxy 返回一个代理感知的类,其 new 方法创建的实例会通过指定代理路由所有请求。用户名和密码通过 Proxy-Authorization 头自动传递。

带重试机制的健壮版本

生产环境需要重试逻辑处理瞬态故障:

require 'net/http'
require 'uri'

class RobustProxyClient
  MAX_RETRIES = 3
  RETRY_DELAY = 2 # 秒
  
  attr_reader :proxy_host, :proxy_port, :proxy_user, :proxy_pass
  
  def initialize(proxy_host: 'gate.proxyhat.com', 
                 proxy_port: 8080,
                 proxy_user: ENV['PROXYHAT_USER'],
                 proxy_pass: ENV['PROXYHAT_PASS'])
    @proxy_host = proxy_host
    @proxy_port = proxy_port
    @proxy_user = proxy_user
    @proxy_pass = proxy_pass
  end
  
  def get(url, headers: {}, timeout: 30)
    retries = 0
    
    loop do
      result = execute_request(url, headers, timeout)
      return result if result[:status] || retries >= MAX_RETRIES
      
      retries += 1
      sleep(RETRY_DELAY * retries) # 指数退避
    end
    
    { error: 'max_retries_exceeded' }
  end
  
  private
  
  def execute_request(url, headers, timeout)
    uri = URI.parse(url)
    proxy_class = Net::HTTP::Proxy(proxy_host, proxy_port, proxy_user, proxy_pass)
    http = proxy_class.new(uri.host, uri.port)
    
    http.use_ssl = (uri.scheme == 'https')
    http.open_timeout = 15
    http.read_timeout = timeout
    http.verify_mode = OpenSSL::SSL::VERIFY_PEER
    http.ssl_version = :TLSv1_2
    
    request = Net::HTTP::Get.new(uri.request_uri)
    headers.each { |k, v| request[k] = v }
    request['User-Agent'] ||= 'RubyProxyClient/1.0'
    
    response = http.request(request)
    
    { status: response.code.to_i, body: response.body, headers: response.each_header.to_h }
    
  rescue Net::OpenTimeout, Net::ReadTimeout => e
    { error: 'timeout', retryable: true }
  rescue Errno::ECONNRESET, Errno::EPIPE => e
    { error: 'connection_reset', retryable: true }
  rescue Net::HTTPServerError => e
    { error: 'server_error', retryable: true }
  rescue => e
    { error: 'unexpected_error', message: e.message, retryable: false }
  end
end

# 使用示例
client = RobustProxyClient.new(
  proxy_user: 'user-country-US',  # 美国住宅代理
  proxy_pass: 'your_password'
)

response = client.get('https://httpbin.org/ip', headers: { 'Accept' => 'application/json' })
puts response[:body] if response[:status] == 200

Typhoeus:libcurl 支持的高性能并发请求

Typhoeus 是基于 libcurl 的 Ruby HTTP 客户端,支持真正的并行请求。其 Hydra 接口允许同时发送数十甚至数百个请求,非常适合大规模抓取场景。

单请求代理配置

require 'typhoeus'

def typhoeus_proxy_request(url, proxy_url: nil)
  # 代理 URL 格式: http://user:pass@host:port
  proxy_url ||= "http://#{ENV['PROXYHAT_USER']}:#{ENV['PROXYHAT_PASS']}@gate.proxyhat.com:8080"
  
  request = Typhoeus::Request.new(
    url,
    method: :get,
    proxy: proxy_url,
    proxyauth: :any,  # 自动检测认证方式
    headers: {
      'User-Agent' => 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)',
      'Accept' => 'text/html,application/xhtml+xml'
    },
    timeout: 30,
    connecttimeout: 15,
    followlocation: true,
    ssl_verifypeer: true,
    ssl_verifyhost: 2
  )
  
  response = request.run
  
  if response.success?
    {
      status: response.code,
      body: response.body,
      headers: response.response_headers,
      total_time: response.total_time
    }
  elsif response.timed_out?
    { error: 'timeout' }
  elsif response.code == 0
    { error: 'network_failure', message: response.return_message }
  else
    { status: response.code, error: 'http_error' }
  end
end

# 使用地理定位代理
proxy_url = 'http://user-country-DE-city-berlin:password@gate.proxyhat.com:8080'
result = typhoeus_proxy_request('https://httpbin.org/ip', proxy_url: proxy_url)
puts result[:body]

Hydra 并发请求引擎

Hydra 是 Typhoeus 的核心优势——可以并发执行多个请求,同时管理最大并发数和超时:

require 'typhoeus'

class ConcurrentScraper
  MAX_CONCURRENCY = 50
  
  def initialize(proxy_user:, proxy_pass:, country: nil, city: nil)
    @proxy_user = build_proxy_user(proxy_user, country, city)
    @proxy_pass = proxy_pass
    @hydra = Typhoeus::Hydra.new(max_concurrency: MAX_CONCURRENCY)
    @results = Queue.new
  end
  
  def scrape_urls(urls)
    requests = urls.map { |url| build_request(url) }
    requests.each { |req| @hydra.queue(req) }
    
    @hydra.run
    
    results = []
    until @results.empty?
      results << @results.pop
    end
    
    results
  end
  
  private
  
  def build_proxy_user(base_user, country, city)
    user = base_user
    user += "-country-#{country}" if country
    user += "-city-#{city}" if city
    user
  end
  
  def build_request(url)
    proxy_url = "http://#{@proxy_user}:#{@proxy_pass}@gate.proxyhat.com:8080"
    
    request = Typhoeus::Request.new(
      url,
      method: :get,
      proxy: proxy_url,
      proxyauth: :any,
      headers: {
        'User-Agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)',
        'Accept-Language' => 'en-US,en;q=0.9'
      },
      timeout: 25,
      connecttimeout: 10,
      followlocation: true,
      ssl_verifypeer: true
    )
    
    request.on_complete do |response|
      @results << {
        url: url,
        status: response.code,
        body: response.body,
        time: response.total_time,
        success: response.success?
      }
    end
    
    request.on_failure do |response|
      @results << {
        url: url,
        status: response.code,
        error: response.return_message,
        success: false
      }
    end
    
    request
  end
end

# 使用示例:并发抓取 100 个 URL
urls = (1..100).map { |i| "https://httpbin.org/delay/#{rand(1..3)}" }

scraper = ConcurrentScraper.new(
  proxy_user: 'your_username',
  proxy_pass: 'your_password',
  country: 'US'
)

results = scraper.scrape_urls(urls)
successful = results.count { |r| r[:success] }
puts "成功: #{successful}/#{urls.size}"
puts "平均耗时: #{results.sum { |r| r[:time] || 0 } / results.size}秒"

ProxyHat Ruby SDK:轮换与地理定位

ProxyHat 提供专门的 Ruby SDK,封装了代理轮换、会话管理和地理定位功能,简化住宅代理的使用:

require 'proxyhat_sdk'  # gem install proxyhat_sdk

class ProxyHatClient
  attr_reader :config
  
  def initialize(username:, password:, default_country: nil)
    @config = {
      gateway: 'gate.proxyhat.com',
      http_port: 8080,
      socks5_port: 1080,
      username: username,
      password: password,
      default_country: default_country
    }
  end
  
  # 轮换代理:每次请求使用新 IP
  def rotating_proxy_url(country: nil, city: nil)
    user = build_username(rotate: true, country: country, city: city)
    "http://#{user}:#{@config[:password]}@#{@config[:gateway]}:#{@config[:http_port]}"
  end
  
  # 粘性会话:保持同一 IP
  def sticky_session_url(session_id:, country: nil, city: nil)
    user = build_username(session: session_id, country: country, city: city)
    "http://#{user}:#{@config[:password]}@#{@config[:gateway]}:#{@config[:http_port]}"
  end
  
  # SOCKS5 代理
  def socks5_proxy_url(country: nil)
    user = build_username(rotate: true, country: country)
    "socks5://#{user}:#{@config[:password]}@#{@config[:gateway]}:#{@config[:socks5_port]}"
  end
  
  private
  
  def build_username(rotate: false, session: nil, country: nil, city: nil)
    parts = [@config[:username]]
    
    if session
      parts << "session-#{session}"
    elsif rotate
      parts << "rotate-#{SecureRandom.hex(8)}"
    end
    
    parts << "country-#{country || @config[:default_country]}" if country || @config[:default_country]
    parts << "city-#{city}" if city
    
    parts.join('-')
  end
end

# 使用示例
client = ProxyHatClient.new(
  username: 'your_username',
  password: 'your_password',
  default_country: 'US'
)

# 轮换代理请求
rotating_url = client.rotating_proxy_url(country: 'DE', city: 'berlin')
puts "轮换代理: #{rotating_url}"

# 粘性会话(同一 IP 保持 10 分钟)
sticky_url = client.sticky_session_url(session_id: 'order_12345', country: 'GB')
puts "粘性会话: #{sticky_url}"

实战:1000 URL 并发抓取

以下是一个完整的生产级示例,展示如何使用轮换住宅代理并发抓取 1000 个 URL:

require 'typhoeus'
require 'json'
require 'logger'

class ProductionScraper
  BATCH_SIZE = 100
  MAX_CONCURRENCY = 50
  RETRY_COUNT = 2
  
  def initialize(proxy_user:, proxy_pass:, country: 'US')
    @proxy_user = proxy_user
    @proxy_pass = proxy_pass
    @country = country
    @logger = Logger.new(STDOUT)
    @logger.level = Logger::INFO
  end
  
  def scrape(urls)
    @logger.info "开始抓取 #{urls.size} 个 URL"
    start_time = Time.now
    
    results = { success: [], failed: [], retried: [] }
    
    urls.each_slice(BATCH_SIZE).with_index do |batch, batch_idx|
      @logger.info "处理批次 #{batch_idx + 1}/#{(urls.size.to_f / BATCH_SIZE).ceil}"
      
      batch_results = process_batch(batch)
      
      batch_results.each do |result|
        if result[:success]
          results[:success] << result
        else
          results[:failed] << result
        end
      end
      
      # 批次间短暂延迟,避免触发限制
      sleep(0.5) unless batch_idx == (urls.size.to_f / BATCH_SIZE).ceil - 1
    end
    
    # 处理失败重试
    if results[:failed].any?
      @logger.info "重试 #{results[:failed].size} 个失败请求"
      retry_results = retry_failed(results[:failed])
      results[:success] += retry_results[:success]
      results[:retried] = retry_results[:success]
      results[:failed] = retry_results[:failed]
    end
    
    elapsed = Time.now - start_time
    @logger.info "完成! 成功: #{results[:success].size}, 失败: #{results[:failed].size}, 耗时: #{elapsed.round(2)}秒"
    
    results
  end
  
  private
  
  def process_batch(urls)
    hydra = Typhoeus::Hydra.new(max_concurrency: MAX_CONCURRENCY)
    results = []
    mutex = Mutex.new
    
    urls.each do |url|
      request = build_request(url)
      
      request.on_complete do |response|
        mutex.synchronize do
          results << parse_response(url, response)
        end
      end
      
      hydra.queue(request)
    end
    
    hydra.run
    results
  end
  
  def build_request(url, retry_count: 0)
    # 每个请求使用新的轮换 IP
    session_id = SecureRandom.hex(8)
    proxy_user = "#{@proxy_user}-session-#{session_id}-country-#{@country}"
    proxy_url = "http://#{proxy_user}:#{@proxy_pass}@gate.proxyhat.com:8080"
    
    Typhoeus::Request.new(
      url,
      method: :get,
      proxy: proxy_url,
      proxyauth: :any,
      headers: random_headers,
      timeout: 30,
      connecttimeout: 15,
      followlocation: true,
      ssl_verifypeer: true,
      maxredirs: 3
    )
  end
  
  def random_headers
    user_agents = [
      'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
      'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Safari/605.1.15',
      'Mozilla/5.0 (X11; Linux x86_64; rv:121.0) Gecko/20100101 Firefox/121.0'
    ]
    
    {
      'User-Agent' => user_agents.sample,
      'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
      'Accept-Language' => 'en-US,en;q=0.9',
      'Accept-Encoding' => 'gzip, deflate, br',
      'DNT' => '1',
      'Connection' => 'keep-alive',
      'Upgrade-Insecure-Requests' => '1'
    }
  end
  
  def parse_response(url, response)
    {
      url: url,
      status: response.code,
      body: response.body,
      time: response.total_time,
      success: response.success? && response.code.between?(200, 299),
      size: response.body&.bytesize || 0
    }
  end
  
  def retry_failed(failed_results)
    success = []
    still_failed = []
    
    failed_results.each do |result|
      RETRY_COUNT.times do |attempt|
        sleep(2 ** attempt)  # 指数退避
        
        request = build_request(result[:url])
        response = request.run
        
        if response.success? && response.code.between?(200, 299)
          success << parse_response(result[:url], response)
          break
        elsif attempt == RETRY_COUNT - 1
          still_failed << result
        end
      end
    end
    
    { success: success, failed: still_failed }
  end
end

# 执行抓取
scraper = ProductionScraper.new(
  proxy_user: ENV['PROXYHAT_USER'],
  proxy_pass: ENV['PROXYHAT_PASS'],
  country: 'US'
)

# 生成 1000 个测试 URL
urls = (1..1000).map { |i| "https://httpbin.org/uuid" }

results = scraper.scrape(urls)

# 输出统计
puts "\n=== 抓取统计 ==="
puts "成功率: #{(results[:success].size.to_f / 1000 * 100).round(2)}%"
puts "总数据量: #{results[:success].sum { |r| r[:size] }} bytes"
puts "平均响应时间: #{results[:success].sum { |r| r[:time] } / results[:success].size} 秒"

TLS/SSL 配置与证书处理

代理场景下的 TLS 配置需要特别注意。某些上游服务器使用自签名证书或证书链不完整,需要灵活处理:

require 'net/http'
require 'openssl'

class TLSAwareProxyClient
  def initialize(proxy_host: 'gate.proxyhat.com', proxy_port: 8080)
    @proxy_host = proxy_host
    @proxy_port = proxy_port
  end
  
  # 标准验证(推荐用于生产)
  def fetch_strict_tls(url, proxy_user:, proxy_pass:)
    uri = URI.parse(url)
    proxy = Net::HTTP::Proxy(@proxy_host, @proxy_port, proxy_user, proxy_pass)
    http = proxy.new(uri.host, uri.port)
    
    http.use_ssl = (uri.scheme == 'https')
    http.verify_mode = OpenSSL::SSL::VERIFY_PEER
    http.cert_store = default_cert_store
    http.min_version = OpenSSL::SSL::TLS1_2_VERSION
    http.max_version = OpenSSL::SSL::TLS1_3_VERSION
    
    request = Net::HTTP::Get.new(uri.request_uri)
    http.request(request)
  end
  
  # 宽松验证(仅用于测试环境或已知上游)
  def fetch_permissive_tls(url, proxy_user:, proxy_pass:)
    uri = URI.parse(url)
    proxy = Net::HTTP::Proxy(@proxy_host, @proxy_port, proxy_user, proxy_pass)
    http = proxy.new(uri.host, uri.port)
    
    http.use_ssl = (uri.scheme == 'https')
    http.verify_mode = OpenSSL::SSL::VERIFY_NONE  # 警告:不安全
    http.ssl_version = :TLSv1_2
    
    request = Net::HTTP::Get.new(uri.request_uri)
    http.request(request)
  end
  
  # 自定义证书存储(企业环境)
  def fetch_with_custom_ca(url, proxy_user:, proxy_pass:, ca_path: nil, ca_file: nil)
    uri = URI.parse(url)
    proxy = Net::HTTP::Proxy(@proxy_host, @proxy_port, proxy_user, proxy_pass)
    http = proxy.new(uri.host, uri.port)
    
    http.use_ssl = (uri.scheme == 'https')
    http.verify_mode = OpenSSL::SSL::VERIFY_PEER
    http.cert_store = custom_cert_store(ca_path: ca_path, ca_file: ca_file)
    
    # 配置 SNI(服务器名称指示)
    http.enable_post_connection_check = true
    
    request = Net::HTTP::Get.new(uri.request_uri)
    http.request(request)
  end
  
  private
  
  def default_cert_store
    store = OpenSSL::X509::Store.new
    store.set_default_paths
    store
  end
  
  def custom_cert_store(ca_path: nil, ca_file: nil)
    store = OpenSSL::X509::Store.new
    store.add_path(ca_path) if ca_path && Dir.exist?(ca_path)
    store.add_file(ca_file) if ca_file && File.exist?(ca_file)
    store.set_default_paths
    store
  end
end

# Typhoeus TLS 配置示例
require 'typhoeus'

def typhoeus_strict_tls_request(url, proxy_url)
  Typhoeus::Request.new(
    url,
    method: :get,
    proxy: proxy_url,
    ssl_verifypeer: true,      # 验证对端证书
    ssl_verifyhost: 2,         # 验证主机名
    sslversion: :tlsv1_2,
    cainfo: '/etc/ssl/certs/ca-certificates.crt',  # Linux CA 包路径
    timeout: 30
  ).run
end

def typhoeus_permissive_request(url, proxy_url)
  Typhoeus::Request.new(
    url,
    method: :get,
    proxy: proxy_url,
    ssl_verifypeer: false,     # 跳过证书验证
    ssl_verifyhost: 0,         # 不验证主机名
    timeout: 30
  ).run
end

Rails 集成:Faraday 中间件与 ActiveJob

在 Rails 应用中,推荐使用 Faraday 作为 HTTP 客户端抽象层,便于测试和中间件复用:

Faraday 代理中间件

# config/initializers/proxy_client.rb
require 'faraday'
require 'faraday/retry'

class ProxyFaradayClient
  def initialize(proxy_user:, proxy_pass:, country: nil)
    @proxy_user = proxy_user
    @proxy_pass = proxy_pass
    @country = country
  end
  
  def connection
    @connection ||= Faraday.new do |builder|
      builder.request :retry, {
        max: 3,
        interval: 1,
        backoff_factor: 2,
        retry_statuses: [429, 500, 502, 503, 504],
        methods: [:get, :head, :options]
      }
      
      builder.response :json, content_type: /\bjson\b/
      builder.response :raise_error
      
      builder.adapter :typhoeus
      
      builder.options.timeout = 30
      builder.options.open_timeout = 15
    end
  end
  
  def get(url, headers: {}, country: nil)
    proxy_url = build_proxy_url(country: country || @country)
    
    connection.get(url) do |req|
      req.options.proxy = proxy_url
      headers.each { |k, v| req.headers[k] = v }
    end
  rescue Faraday::Error => e
    { error: e.class.name, message: e.message }
  end
  
  private
  
  def build_proxy_url(country: nil)
    user = @proxy_user
    user += "-country-#{country}" if country
    user += "-rotate-#{SecureRandom.hex(4)}"
    "http://#{user}:#{@proxy_pass}@gate.proxyhat.com:8080"
  end
end

# 全局配置
PROXY_CLIENT = ProxyFaradayClient.new(
  proxy_user: ENV['PROXYHAT_USER'],
  proxy_pass: ENV['PROXYHAT_PASS'],
  country: 'US'
)

ActiveJob 后台抓取任务

# app/jobs/scraping_job.rb
class ScrapingJob < ApplicationJob
  queue_as :scraping
  
  # 重试策略
  retry_on NetworkError, wait: :polynomially_longer, attempts: 3
  discard_on ScrapingTimeoutError
  
  def perform(urls, options = {})
    @country = options[:country] || 'US'
    @results = []
    
    urls.each_slice(50) do |batch|
      process_batch(batch)
    end
    
    # 存储结果
    store_results(@results)
    
    # 发送通知
    ScrapingCompletionMailer.notify(@results.size).deliver_later
  end
  
  private
  
  def process_batch(urls)
    hydra = Typhoeus::Hydra.new(max_concurrency: 25)
    mutex = Mutex.new
    
    urls.each do |url|
      request = build_request(url)
      
      request.on_complete do |response|
        mutex.synchronize do
          @results << {
            url: url,
            status: response.code,
            body: response.body,
            scraped_at: Time.current
          }
        end
      end
      
      hydra.queue(request)
    end
    
    hydra.run
  end
  
  def build_request(url)
    session_id = SecureRandom.hex(6)
    proxy_user = "#{ENV['PROXYHAT_USER']}-session-#{session_id}-country-#{@country}"
    proxy_url = "http://#{proxy_user}:#{ENV['PROXYHAT_PASS']}@gate.proxyhat.com:8080"
    
    Typhoeus::Request.new(
      url,
      method: :get,
      proxy: proxy_url,
      timeout: 25,
      connecttimeout: 10,
      followlocation: true,
      ssl_verifypeer: true,
      headers: {
        'User-Agent' => random_user_agent,
        'Accept' => 'text/html,application/xhtml+xml'
      }
    )
  end
  
  def random_user_agent
    [
      'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
      'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15',
      'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36'
    ].sample
  end
  
  def store_results(results)
    # 批量插入数据库
    ScrapedPage.insert_all(
      results.map { |r| r.merge(created_at: Time.current, updated_at: Time.current) }
    )
  end
end

# 调用方式
ScrapingJob.perform_later(
  ['https://example.com/page1', 'https://example.com/page2'],
  country: 'DE'
)

代理类型对比

特性 住宅代理 数据中心代理 移动代理
IP 来源 真实住宅 ISP 云服务商 IP 段 移动运营商 4G/5G
匿名性 极高 中等 极高
速度 中等 极快 较慢
成功率 95%+ 60-80% 98%+
价格 中高
适用场景 SERP、电商、社媒 大规模数据采集 移动应用抓取

关键要点

  • Net::HTTP 适合简单场景,无需额外依赖,但并发能力有限。
  • Typhoeus + Hydra 是大规模并发抓取的最佳选择,支持真正的并行请求。
  • 代理轮换 通过用户名参数控制,每次请求使用 session-{随机ID} 获得新 IP。
  • 地理定位 使用 country-{国家码}-city-{城市} 格式精确控制出口位置。
  • TLS 配置 生产环境务必使用 VERIFY_PEER,测试环境可临时放宽。
  • Rails 集成 推荐使用 Faraday 抽象层配合 ActiveJob 处理后台任务。

选择合适的代理方案取决于你的具体需求。对于需要高成功率和高匿名性的 SERP 抓取或价格监控,住宅代理是首选。如果追求极致速度且目标网站反爬较弱,数据中心代理更具性价比。

访问 ProxyHat 定价页面 了解更多住宅代理方案,或查看 网页抓取用例 获取更多技术细节。

准备开始了吗?

通过AI过滤访问148多个国家的5000多万个住宅IP。

查看价格住宅代理
← 返回博客