Guida Completa ai Proxy HTTP in Ruby: Net::HTTP, Typhoeus e SDK ProxyHat

Impara a configurare proxy HTTP in Ruby con Net::HTTP, Typhoeus e ProxyHat SDK. Esempi pratici per web scraping concorrentiale, rotazione IP e geo-targeting.

Guida Completa ai Proxy HTTP in Ruby: Net::HTTP, Typhoeus e SDK ProxyHat

Se stai costruendo pipeline di dati o sistemi di web scraping in Ruby, prima o poi dovrai affrontare blocchi IP, rate limiting e restrizioni geografiche. I proxy HTTP sono la soluzione — ma integrarli correttamente richiede più di qualche riga di codice. Questa guida ti mostra come usare proxy in Ruby con Net::HTTP (stdlib), Typhoeus (libcurl) e il ProxyHat SDK, con pattern production-ready per rotazione IP, gestione errori e concorrenza.

Perché i Proxy sono Essenziali in Ruby

Ruby eccelle nella manipolazione dati e nell'automazione, ma le librerie HTTP standard non gestiscono nativamente scenari enterprise come rotazione IP, sticky session o geo-targeting. Senza proxy, il tuo scraper verrà bloccato dopo poche centinaia di richieste.

I casi d'uso tipici includono:

  • SERP tracking — risultati di ricerca localizzati
  • Price monitoring — e-commerce competitor analysis
  • Data pipeline — estrazione massiva da API con rate limit
  • QA testing — verificare comportamento da diverse geolocalizzazioni

Net::HTTP: Proxy con la Libreria Standard

Net::HTTP è inclusa nella standard library di Ruby. Non richiede dipendenze esterne, ma la configurazione proxy richiede attenzione ai dettagli — specialmente per l'autenticazione e la gestione errori.

Configurazione Base con Proxy HTTP

Ecco come configurare Net::HTTP con proxy autenticato:

require 'net/http'
require 'uri'

# Configurazione proxy ProxyHat
PROXY_HOST = 'gate.proxyhat.com'
PROXY_PORT = 8080
PROXY_USER = 'user-country-US'
PROXY_PASS = 'your_password'

def fetch_with_proxy(url_str)
  uri = URI.parse(url_str)
  
  # Crea connessione tramite proxy
  http = Net::HTTP.new(
    uri.host,
    uri.port,
    PROXY_HOST,
    PROXY_PORT,
    PROXY_USER,
    PROXY_PASS
  )
  
  # Configurazione TLS/SSL per HTTPS
  if uri.scheme == 'https'
    http.use_ssl = true
    http.verify_mode = OpenSSL::SSL::VERIFY_PEER
    http.min_version = OpenSSL::SSL::TLS1_2_VERSION
  end
  
  # Timeout production-ready
  http.open_timeout = 15
  http.read_timeout = 30
  http.write_timeout = 10
  
  request = Net::HTTP::Get.new(uri.request_uri)
  request['User-Agent'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
  request['Accept'] = 'text/html,application/xhtml+xml'
  
  response = http.request(request)
  
  case response
  when Net::HTTPSuccess
    { status: response.code.to_i, body: response.body, headers: response.each_header.to_h }
  when Net::HTTPRedirection
    { status: response.code.to_i, location: response['Location'], redirect: true }
  when Net::HTTPTooManyRequests
    { status: 429, retry_after: response['Retry-After']&.to_i || 60, rate_limited: true }
  else
    { status: response.code.to_i, error: response.message }
  end
end

# Esempio utilizzo
result = fetch_with_proxy('https://httpbin.org/ip')
puts "Status: #{result[:status]}"
puts "Body: #{result[:body][0..200]}" if result[:body]

Gestione Errori e Retry con Circuit Breaker

In produzione, le richieste falliscono. Ecco un pattern robusto con exponential backoff:

require 'net/http'
require 'logger'

class ProxyHTTPClient
  class CircuitOpenError < StandardError; end
  
  MAX_RETRIES = 3
  BASE_DELAY = 1.0
  CIRCUIT_FAILURE_THRESHOLD = 5
  CIRCUIT_RESET_TIMEOUT = 60
  
  attr_reader :logger, :circuit_state
  
  def initialize(proxy_host:, proxy_port:, proxy_user:, proxy_pass:, logger: nil)
    @proxy_host = proxy_host
    @proxy_port = proxy_port
    @proxy_user = proxy_user
    @proxy_pass = proxy_pass
    @logger = logger || Logger.new($stdout, level: Logger::INFO)
    @failure_count = 0
    @circuit_opened_at = nil
    @mutex = Mutex.new
  end
  
  def get(url_str)
    raise CircuitOpenError, 'Circuit breaker is open' if circuit_open?
    
    retries = 0
    begin
      execute_request(url_str)
    rescue Net::OpenTimeout, Net::ReadTimeout, Errno::ECONNREFUSED, Errno::ECONNRESET => e
      record_failure
      retries += 1
      
      if retries <= MAX_RETRIES
        delay = BASE_DELAY * (2 ** (retries - 1)) + rand(0.0..0.5)
        logger.warn "Retry #{retries}/#{MAX_RETRIES} after #{delay.round(2)}s: #{e.class} - #{e.message}"
        sleep(delay)
        retry
      else
        logger.error "Max retries exceeded for #{url_str}"
        raise
      end
    rescue OpenSSL::SSL::SSLError => e
      logger.error "SSL error for #{url_str}: #{e.message}"
      record_failure
      raise
    end
  end
  
  private
  
  def execute_request(url_str)
    uri = URI.parse(url_str)
    
    http = Net::HTTP.new(
      uri.host, uri.port,
      @proxy_host, @proxy_port,
      @proxy_user, @proxy_pass
    )
    
    configure_ssl(http, uri)
    http.open_timeout = 15
    http.read_timeout = 30
    
    response = http.request(Net::HTTP::Get.new(uri.request_uri))
    record_success
    parse_response(response)
  end
  
  def configure_ssl(http, uri)
    return unless uri.scheme == 'https'
    http.use_ssl = true
    http.verify_mode = OpenSSL::SSL::VERIFY_PEER
    http.min_version = OpenSSL::SSL::TLS1_2_VERSION
    http.ciphers = 'TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256'
  end
  
  def parse_response(response)
    { status: response.code.to_i, body: response.body }
  end
  
  def record_success
    @mutex.synchronize do
      @failure_count = 0
      @circuit_opened_at = nil
    end
  end
  
  def record_failure
    @mutex.synchronize do
      @failure_count += 1
      if @failure_count >= CIRCUIT_FAILURE_THRESHOLD
        @circuit_opened_at = Time.now
        logger.error "Circuit breaker OPENED after #{@failure_count} failures"
      end
    end
  end
  
  def circuit_open?
    @mutex.synchronize do
      return false unless @circuit_opened_at
      
      if Time.now - @circuit_opened_at > CIRCUIT_RESET_TIMEOUT
        logger.info 'Circuit breaker HALF-OPEN - attempting reset'
        @circuit_opened_at = nil
        @failure_count = 0
        false
      else
        true
      end
    end
  end
end

# Utilizzo
client = ProxyHTTPClient.new(
  proxy_host: 'gate.proxyhat.com',
  proxy_port: 8080,
  proxy_user: 'user-country-US',
  proxy_pass: ENV['PROXYHAT_PASSWORD']
)

result = client.get('https://httpbin.org/ip')
puts "Response: #{result[:status]}"

Typhoeus: Richieste Parallele con libcurl

Typhoeus è un wrapper Ruby per libcurl che supporta richieste parallele tramite Hydra. È ideale quando devi fetchare centinaia di URL simultaneamente.

Installazione e Configurazione Base

# Gemfile
gem 'typhoeus', '~> 1.4'
require 'typhoeus'

# Configurazione proxy ProxyHat
PROXY_URL = 'http://user-country-US:your_password@gate.proxyhat.com:8080'

def fetch_typhoeus(url_str)
  request = Typhoeus::Request.new(
    url_str,
    method: :get,
    proxy: PROXY_URL,
    followlocation: true,
    timeout: 30,
    connecttimeout: 15,
    ssl_verifypeer: true,
    ssl_verifyhost: 2,
    headers: {
      'User-Agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
      'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9'
    }
  )
  
  response = request.run
  
  {
    status: response.code,
    body: response.body,
    headers: response.headers,
    total_time: response.total_time,
    effective_url: response.effective_url
  }
end

result = fetch_typhoeus('https://httpbin.org/ip')
puts "Status: #{result[:status]}, Time: #{result[:total_time].round(3)}s"

Richieste Parallele con Hydra

Hydra permette di eseguire decine di richieste concorrenti con un singolo event loop:

require 'typhoeus'

class ParallelScraper
  CONCURRENCY = 50
  PROXY_URL = 'http://user-country-US:your_password@gate.proxyhat.com:8080'
  
  def initialize(urls)
    @urls = urls
    @results = Queue.new
    @hydra = Typhoeus::Hydra.new(max_concurrency: CONCURRENCY)
  end
  
  def scrape
    start_time = Time.now
    
    @urls.each_with_index do |url, index|
      request = build_request(url, index)
      @hydra.queue(request)
    end
    
    @hydra.run
    
    elapsed = Time.now - start_time
    results = drain_results
    
    {
      total_urls: @urls.size,
      successful: results.count { |r| r[:status] == 200 },
      failed: results.count { |r| r[:status] != 200 },
      elapsed_seconds: elapsed.round(2),
      requests_per_second: (results.size / elapsed).round(2),
      results: results
    }
  end
  
  private
  
  def build_request(url, index)
    request = Typhoeus::Request.new(
      url,
      method: :get,
      proxy: PROXY_URL,
      timeout: 30,
      connecttimeout: 15,
      followlocation: true,
      headers: {
        'User-Agent' => random_user_agent,
        'Accept' => 'text/html,application/xhtml+xml'
      }
    )
    
    request.on_complete do |response|
      @results << {
        index: index,
        url: url,
        status: response.code,
        body: response.body,
        time: response.total_time,
        success: response.code == 200
      }
    end
    
    request.on_failure do |response|
      @results << {
        index: index,
        url: url,
        status: response.code || 0,
        error: response.status_message,
        success: false
      }
    end
    
    request
  end
  
  def drain_results
    results = []
    results << @results.pop until @results.empty?
    results.sort_by { |r| r[:index] }
  end
  
  def random_user_agent
    agents = [
      'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
      'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
      'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
    ]
    agents.sample
  end
end

# Genera lista URL di test
urls = 100.times.map { |i| "https://httpbin.org/delay/#{rand(1..3)}?id=#{i}" }

scraper = ParallelScraper.new(urls)
stats = scraper.scrape

puts "\n=== Statistiche ==="
puts "URL processati: #{stats[:total_urls]}"
puts "Successi: #{stats[:successful]}"
puts "Fallimenti: #{stats[:failed]}"
puts "Tempo totale: #{stats[:elapsed_seconds]}s"
puts "Richieste/sec: #{stats[:requests_per_second]}"

ProxyHat Ruby SDK: Rotazione IP e Geo-Targeting

Il ProxyHat SDK astrae la complessità della rotazione IP, consentendo geo-targeting granulare e sticky session senza modificare manualmente le credenziali.

Rotazione Automatica per Richiesta

require 'typhoeus'

class ProxyHatClient
  BASE_URL = 'http://%s:%s@gate.proxyhat.com:8080'
  
  def initialize(username:, password:, country: nil, city: nil, session: nil)
    @username = build_username(username, country: country, city: city, session: session)
    @password = password
    @proxy_url = format(BASE_URL, @username, @password)
  end
  
  # Crea un nuovo client con sessione sticky (IP persistente)
  def with_sticky_session(session_id = SecureRandom.hex(8))
    self.class.new(
      username: @username.split('-').first,
      password: @password,
      country: extract_country,
      city: extract_city,
      session: session_id
    )
  end
  
  # Crea un nuovo client per paese diverso
  def with_country(country_code)
    self.class.new(
      username: @username.split('-').first,
      password: @password,
      country: country_code,
      city: nil,
      session: nil
    )
  end
  
  def get(url_str)
    Typhoeus::Request.new(
      url_str,
      method: :get,
      proxy: @proxy_url,
      timeout: 30,
      followlocation: true,
      headers: default_headers
    ).run
  end
  
  def post(url_str, body:, content_type: 'application/json')
    Typhoeus::Request.new(
      url_str,
      method: :post,
      body: body.to_json,
      proxy: @proxy_url,
      timeout: 30,
      headers: default_headers.merge('Content-Type' => content_type)
    ).run
  end
  
  private
  
  def build_username(base, country:, city:, session:)
    parts = [base]
    parts << "country-#{country}" if country
    parts << "city-#{city}" if city
    parts << "session-#{session}" if session
    parts.join('-')
  end
  
  def extract_country
    @username[/country-([A-Z]{2})/, 1]
  end
  
  def extract_city
    @username[/city-([a-z]+)/, 1]
  end
  
  def default_headers
    {
      'User-Agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
      'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9'
    }
  end
end

# Utilizzo: rotazione automatica (nuovo IP per richiesta)
client = ProxyHatClient.new(
  username: 'user',
  password: ENV['PROXYHAT_PASSWORD']
)

# Geo-targeting: richieste da IP tedeschi
de_client = client.with_country('DE')
response = de_client.get('https://httpbin.org/ip')
puts "IP tedesco: #{JSON.parse(response.body)['origin']}"

# Sticky session: stesso IP per richieste multiple
sticky_client = client.with_country('US').with_sticky_session('order-123')
5.times do |i|
  resp = sticky_client.get('https://httpbin.org/ip')
  ip = JSON.parse(resp.body)['origin']
  puts "Richiesta #{i + 1}: #{ip}"
end

Esempio Reale: Scraping di 1000 URL con Proxy Residenziali

Questo esempio combina Typhoeus Hydra con ProxyHat per scraping massivo con rotazione IP automatica:

require 'typhoeus'
require 'json'
require 'logger'
require 'concurrent' # gem 'concurrent-ruby'

class MassScraper
  BATCH_SIZE = 100
  MAX_CONCURRENCY = 50
  RETRYABLE_CODES = [429, 500, 502, 503, 504].freeze
  
  def initialize(username:, password:, logger: nil)
    @username = username
    @password = password
    @logger = logger || Logger.new($stdout, level: Logger::INFO)
    @stats = Concurrent::Hash.new(0)
  end
  
  def scrape(urls, country: nil)
    @stats.clear
    start_time = Time.now
    
    # Processa in batch per evitare memoria eccessiva
    results = []
    urls.each_slice(BATCH_SIZE).with_index do |batch, batch_num|
      @logger.info "Processing batch #{batch_num + 1}/#{(urls.size.to_f / BATCH_SIZE).ceil}"
      batch_results = process_batch(batch, country: country)
      results.concat(batch_results)
    end
    
    elapsed = Time.now - start_time
    print_summary(results.size, elapsed)
    
    results
  end
  
  private
  
  def process_batch(urls, country:)
    hydra = Typhoeus::Hydra.new(max_concurrency: MAX_CONCURRENCY)
    results_queue = Queue.new
    
    urls.each_with_index do |url, idx|
      # Ogni richiesta usa un proxy diverso (rotazione automatica)
      proxy_url = build_proxy_url(country: country)
      
      request = Typhoeus::Request.new(
        url,
        method: :get,
        proxy: proxy_url,
        timeout: 30,
        connecttimeout: 15,
        followlocation: true,
        ssl_verifypeer: true,
        headers: {
          'User-Agent' => random_user_agent,
          'Accept' => 'text/html,application/xhtml+xml'
        }
      )
      
      request.on_complete do |response|
        @stats[:total] += 1
        
        if response.success?
          @stats[:success] += 1
          results_queue << { url: url, status: response.code, body: response.body, success: true }
        elsif RETRYABLE_CODES.include?(response.code)
          @stats[:retries] += 1
          # Re-queue con delay
          hydra.queue(build_retry_request(url, proxy_url, 1))
        else
          @stats[:failed] += 1
          results_queue << { url: url, status: response.code, error: response.status_message, success: false }
        end
      end
      
      hydra.queue(request)
    end
    
    hydra.run
    
    # Drain results
    results = []
    results << results_queue.pop until results_queue.empty?
    results
  end
  
  def build_retry_request(url, proxy_url, attempt)
    request = Typhoeus::Request.new(
      url,
      method: :get,
      proxy: proxy_url,
      timeout: 30 + (attempt * 10),
      connecttimeout: 15,
      headers: { 'User-Agent' => random_user_agent }
    )
    
    request.on_complete do |response|
      if response.success?
        @stats[:success] += 1
      else
        @stats[:failed] += 1
      end
    end
    
    request
  end
  
  def build_proxy_url(country: nil)
    username = @username
    username += "-country-#{country}" if country
    "http://#{username}:#{@password}@gate.proxyhat.com:8080"
  end
  
  def random_user_agent
    [
      'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
      'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Safari/605.1.15',
      'Mozilla/5.0 (X11; Linux x86_64; rv:121.0) Gecko/20100101 Firefox/121.0'
    ].sample
  end
  
  def print_summary(total, elapsed)
    @logger.info "\n" + '=' * 50
    @logger.info "Scraping completato"
    @logger.info "URL totali: #{total}"
    @logger.info "Successi: #{@stats[:success]} (#{percent(@stats[:success], total)}%)"
    @logger.info "Fallimenti: #{@stats[:failed]} (#{percent(@stats[:failed], total)}%)"
    @logger.info "Retry: #{@stats[:retries]}"
    @logger.info "Tempo: #{elapsed.round(2)}s"
    @logger.info "Rate: #{(total / elapsed).round(2)} req/s"
    @logger.info '=' * 50
  end
  
  def percent(value, total)
    return 0 if total.zero?
    ((value.to_f / total) * 100).round(1)
  end
end

# Esempio: scraping 1000 URL
urls = 1000.times.map { |i| "https://httpbin.org/get?id=#{i}" }

scraper = MassScraper.new(
  username: 'user',
  password: ENV['PROXYHAT_PASSWORD']
)

results = scraper.scrape(urls, country: 'US')

# Filtra risultati validi
valid_results = results.select { |r| r[:success] }
puts "Dati estratti da #{valid_results.size} pagine"

Configurazione TLS/SSL: Certificati Self-Signed e SNI

Quando usi proxy intermedi o test ambienti interni, potresti incontrare certificati self-signed o problemi SNI. Ecco come gestirli:

require 'net/http'
require 'openssl'

class TLSAwareProxyClient
  def initialize(proxy_host:, proxy_port:, proxy_user:, proxy_pass:)
    @proxy_host = proxy_host
    @proxy_port = proxy_port
    @proxy_user = proxy_user
    @proxy_pass = proxy_pass
  end
  
  def get(url_str, verify_ssl: true, custom_ca: nil)
    uri = URI.parse(url_str)
    
    http = Net::HTTP.new(
      uri.host, uri.port,
      @proxy_host, @proxy_port,
      @proxy_user, @proxy_pass
    )
    
    if uri.scheme == 'https'
      http.use_ssl = true
      configure_ssl(http, verify_ssl: verify_ssl, custom_ca: custom_ca)
    end
    
    http.open_timeout = 15
    http.read_timeout = 30
    
    # SNI (Server Name Indication) - critico per virtual hosting
    # Net::HTTP imposta SNI automaticamente dall'hostname
    # Per override manuale (raro):
    # http.hostname = 'custom.sni.hostname'
    
    request = Net::HTTP::Get.new(uri.request_uri)
    http.request(request)
  end
  
  private
  
  def configure_ssl(http, verify_ssl:, custom_ca:)
    if verify_ssl
      http.verify_mode = OpenSSL::SSL::VERIFY_PEER
      
      # Usa CA bundle di sistema
      http.ca_file = find_system_ca_bundle
      
      # O specifica CA custom per self-signed
      if custom_ca
        http.ca_file = custom_ca
        http.verify_mode = OpenSSL::SSL::VERIFY_PEER
      end
      
      # Forza TLS 1.2+
      http.min_version = OpenSSL::SSL::TLS1_2_VERSION
      http.max_version = OpenSSL::SSL::TLS1_3_VERSION
      
      # Cipher suite sicure
      http.ciphers = [
        'TLS_AES_256_GCM_SHA384',
        'TLS_CHACHA20_POLY1305_SHA256',
        'TLS_AES_128_GCM_SHA256',
        'ECDHE-RSA-AES256-GCM-SHA384',
        'ECDHE-RSA-AES128-GCM-SHA256'
      ].join(':')
    else
      # SOLO per test/development - ignora errori SSL
      http.verify_mode = OpenSSL::SSL::VERIFY_NONE
      $stderr.puts 'WARNING: SSL verification disabled!'
    end
  end
  
  def find_system_ca_bundle
    # Linux (Debian/Ubuntu)
    return '/etc/ssl/certs/ca-certificates.crt' if File.exist?('/etc/ssl/certs/ca-certificates.crt')
    # Linux (RHEL/CentOS)
    return '/etc/pki/tls/certs/ca-bundle.crt' if File.exist?('/etc/pki/tls/certs/ca-bundle.crt')
    # macOS
    return '/etc/ssl/cert.pem' if File.exist?('/etc/ssl/cert.pem')
    # Fallback
    nil
  end
end

# Utilizzo con verifica SSL standard
client = TLSAwareProxyClient.new(
  proxy_host: 'gate.proxyhat.com',
  proxy_port: 8080,
  proxy_user: 'user-country-US',
  proxy_pass: ENV['PROXYHAT_PASSWORD']
)

response = client.get('https://httpbin.org/get')
puts "Status: #{response.code}"

# Con CA custom per self-signed
# response = client.get('https://internal.local/api', custom_ca: '/path/to/ca.pem')

# Test senza verifica (NON usare in produzione!)
# response = client.get('https://self-signed.local/api', verify_ssl: false)

Integrazione Ruby on Rails: Faraday e ActiveJob

In applicazioni Rails, vuoi integrare proxy con middleware HTTP e job asincroni per evitare di bloccare il request cycle.

Faraday Middleware con Proxy

# Gemfile
gem 'faraday', '~> 2.0'
gem 'faraday-retry', '~> 2.0'
# config/initializers/proxy_http.rb
require 'faraday'
require 'faraday/retry'

class ProxyFaradayMiddleware
  def initialize(app, proxy_host:, proxy_port:, proxy_user:, proxy_pass:, country: nil)
    @app = app
    @proxy_host = proxy_host
    @proxy_port = proxy_port
    @proxy_user = proxy_user
    @proxy_pass = proxy_pass
    @country = country
  end
  
  def call(env)
    # Inietta URL proxy nella richiesta
    proxy_url = build_proxy_url
    env.request.proxy = Faraday::ProxyOptions.from(proxy_url)
    
    @app.call(env)
  end
  
  private
  
  def build_proxy_url
    username = @proxy_user
    username += "-country-#{@country}" if @country
    "http://#{username}:#{@proxy_pass}@#{@proxy_host}:#{@proxy_port}"
  end
end

# Factory per creare connessioni con proxy
module ProxyHTTP
  class << self
    def connection(country: nil, retry_options: {})
      Faraday.new do |builder|
        builder.use ProxyFaradayMiddleware,
          proxy_host: 'gate.proxyhat.com',
          proxy_port: 8080,
          proxy_user: ENV['PROXYHAT_USERNAME'],
          proxy_pass: ENV['PROXYHAT_PASSWORD'],
          country: country
        
        builder.request :retry, {
          max: 3,
          interval: 1.0,
          backoff_factor: 2,
          retry_statuses: [429, 500, 502, 503, 504],
          methods: [:get, :head, :options],
          retry_if: ->(env, _ex) { env.status == 429 }
        }.merge(retry_options)
        
        builder.response :json, content_type: /json/
        builder.response :raise_error
        
        builder.adapter :net_http do |http|
          http.open_timeout = 15
          http.read_timeout = 30
        end
      end
    end
  end
end

# Utilizzo nei controller o service objects
# app/services/serp_tracker_service.rb
class SerpTrackerService
  def initialize(keyword, country: 'US')
    @keyword = keyword
    @country = country
    @conn = ProxyHTTP.connection(country: country)
  end
  
  def fetch_results
    response = @conn.get('https://serpproxy.example.com/search') do |req|
      req.params['q'] = @keyword
      req.params['gl'] = @country.downcase
    end
    
    parse_serp(response.body)
  rescue Faraday::Error => e
    Rails.logger.error "SERP fetch failed: #{e.message}"
    { error: e.message }
  end
  
  private
  
  def parse_serp(data)
    data.dig('organic_results') || []
  end
end

ActiveJob per Scraping Asincrono

# app/jobs/scraping_job.rb
class ScrapingJob < ApplicationJob
  queue_as :scraping
  
  # Evita retry infiniti
  retry_on Faraday::TimeoutError, wait: :polynomially_longer, attempts: 3
  retry_on Faraday::ConnectionFailed, wait: 30.seconds, attempts: 2
  discard_on Faraday::ResourceNotFound
  
  def perform(url_list_id, country: 'US')
    url_list = UrlList.find(url_list_id)
    urls = url_list.urls
    
    results = scrape_urls(urls, country: country)
    store_results(url_list, results)
  ensure
    url_list.update!(completed_at: Time.current)
  end
  
  private
  
  def scrape_urls(urls, country:)
    conn = ProxyHTTP.connection(country: country)
    
    # Processa in parallelo con Thread pool
    pool_size = [urls.size, 20].min
    mutex = Mutex.new
    results = []
    
    threads = urls.each_slice((urls.size.to_f / pool_size).ceil).map do |batch|
      Thread.new do
        batch.each do |url|
          begin
            response = conn.get(url)
            mutex.synchronize do
              results << { url: url, status: response.status, body: response.body }
            end
          rescue Faraday::Error => e
            mutex.synchronize do
              results << { url: url, error: e.message }
            end
          end
        end
      end
    end
    
    threads.each(&:join)
    results
  end
  
  def store_results(url_list, results)
    ScrapedResult.transaction do
      results.each do |result|
        next if result[:error]
        
        ScrapedResult.create!(
          url_list: url_list,
          url: result[:url],
          status_code: result[:status],
          content: result[:body][0..65_535], # Limita dimensione
          scraped_at: Time.current
        )
      end
    end
  end
end

# Chiamata dal controller
# app/controllers/scraping_controller.rb
class ScrapingController < ApplicationController
  def create
    url_list = UrlList.create!(urls: params[:urls].split('\n').map(&:strip))
    ScrapingJob.perform_later(url_list.id, country: params[:country] || 'US')
    redirect_to url_list, notice: 'Scraping avviato in background'
  end
end

# config/sidekiq.yml (se usi Sidekiq)
# :concurrency: 10
# :queues:
#   - scraping
#   - default
#   - mailers

Confronto: Net::HTTP vs Typhoeus vs ProxyHat SDK

Caratteristica Net::HTTP Typhoeus ProxyHat SDK
Dipendenze Nessuna (stdlib) libcurl Wrapper su Typhoeus
Richieste parallele No (usa Thread) Sì (Hydra) Sì (built-in)
Rotazione IP Manuale Manuale Automatica
Geo-targeting Manuale Manuale Integrato
Sticky session Manuale Manuale Integrato
Performance Media Alta Alta
Caso d'uso Script semplici Scraping massivo Production-ready

Punti Chiave

  • Net::HTTP è sufficiente per script semplici, ma richiede codice boilerplate per proxy auth e gestione errori.
  • Typhoeus eccelle per richieste parallele con Hydra — ideale per scraping ad alto volume.
  • ProxyHat SDK astrae rotazione IP, geo-targeting e sticky session — la scelta migliore per production.
  • Sempre configura timeout, retry con exponential backoff e circuit breaker per resilienza.
  • In Rails, usa Faraday middleware e ActiveJob per operazioni asincrone non bloccanti.
  • Verifica sempre TLS/SSL in produzione; disabilita solo per test interni.

Prossimi Passi

Per approfondire, consulta la pagina prezzi di ProxyHat per i piani residenziali, oppure esplora i proxy locations disponibili per il geo-targeting. Se stai costruendo un sistema SERP tracking, vedi il nostro use case dedicato.

Pronto per iniziare?

Accedi a oltre 50M di IP residenziali in oltre 148 paesi con filtraggio AI.

Vedi i prezziProxy residenziali
← Torna al Blog