Se stai costruendo pipeline di dati o sistemi di web scraping in Ruby, prima o poi dovrai affrontare blocchi IP, rate limiting e restrizioni geografiche. I proxy HTTP sono la soluzione — ma integrarli correttamente richiede più di qualche riga di codice. Questa guida ti mostra come usare proxy in Ruby con Net::HTTP (stdlib), Typhoeus (libcurl) e il ProxyHat SDK, con pattern production-ready per rotazione IP, gestione errori e concorrenza.
Perché i Proxy sono Essenziali in Ruby
Ruby eccelle nella manipolazione dati e nell'automazione, ma le librerie HTTP standard non gestiscono nativamente scenari enterprise come rotazione IP, sticky session o geo-targeting. Senza proxy, il tuo scraper verrà bloccato dopo poche centinaia di richieste.
I casi d'uso tipici includono:
- SERP tracking — risultati di ricerca localizzati
- Price monitoring — e-commerce competitor analysis
- Data pipeline — estrazione massiva da API con rate limit
- QA testing — verificare comportamento da diverse geolocalizzazioni
Net::HTTP: Proxy con la Libreria Standard
Net::HTTP è inclusa nella standard library di Ruby. Non richiede dipendenze esterne, ma la configurazione proxy richiede attenzione ai dettagli — specialmente per l'autenticazione e la gestione errori.
Configurazione Base con Proxy HTTP
Ecco come configurare Net::HTTP con proxy autenticato:
require 'net/http'
require 'uri'
# Configurazione proxy ProxyHat
PROXY_HOST = 'gate.proxyhat.com'
PROXY_PORT = 8080
PROXY_USER = 'user-country-US'
PROXY_PASS = 'your_password'
def fetch_with_proxy(url_str)
uri = URI.parse(url_str)
# Crea connessione tramite proxy
http = Net::HTTP.new(
uri.host,
uri.port,
PROXY_HOST,
PROXY_PORT,
PROXY_USER,
PROXY_PASS
)
# Configurazione TLS/SSL per HTTPS
if uri.scheme == 'https'
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_PEER
http.min_version = OpenSSL::SSL::TLS1_2_VERSION
end
# Timeout production-ready
http.open_timeout = 15
http.read_timeout = 30
http.write_timeout = 10
request = Net::HTTP::Get.new(uri.request_uri)
request['User-Agent'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
request['Accept'] = 'text/html,application/xhtml+xml'
response = http.request(request)
case response
when Net::HTTPSuccess
{ status: response.code.to_i, body: response.body, headers: response.each_header.to_h }
when Net::HTTPRedirection
{ status: response.code.to_i, location: response['Location'], redirect: true }
when Net::HTTPTooManyRequests
{ status: 429, retry_after: response['Retry-After']&.to_i || 60, rate_limited: true }
else
{ status: response.code.to_i, error: response.message }
end
end
# Esempio utilizzo
result = fetch_with_proxy('https://httpbin.org/ip')
puts "Status: #{result[:status]}"
puts "Body: #{result[:body][0..200]}" if result[:body]
Gestione Errori e Retry con Circuit Breaker
In produzione, le richieste falliscono. Ecco un pattern robusto con exponential backoff:
require 'net/http'
require 'logger'
class ProxyHTTPClient
class CircuitOpenError < StandardError; end
MAX_RETRIES = 3
BASE_DELAY = 1.0
CIRCUIT_FAILURE_THRESHOLD = 5
CIRCUIT_RESET_TIMEOUT = 60
attr_reader :logger, :circuit_state
def initialize(proxy_host:, proxy_port:, proxy_user:, proxy_pass:, logger: nil)
@proxy_host = proxy_host
@proxy_port = proxy_port
@proxy_user = proxy_user
@proxy_pass = proxy_pass
@logger = logger || Logger.new($stdout, level: Logger::INFO)
@failure_count = 0
@circuit_opened_at = nil
@mutex = Mutex.new
end
def get(url_str)
raise CircuitOpenError, 'Circuit breaker is open' if circuit_open?
retries = 0
begin
execute_request(url_str)
rescue Net::OpenTimeout, Net::ReadTimeout, Errno::ECONNREFUSED, Errno::ECONNRESET => e
record_failure
retries += 1
if retries <= MAX_RETRIES
delay = BASE_DELAY * (2 ** (retries - 1)) + rand(0.0..0.5)
logger.warn "Retry #{retries}/#{MAX_RETRIES} after #{delay.round(2)}s: #{e.class} - #{e.message}"
sleep(delay)
retry
else
logger.error "Max retries exceeded for #{url_str}"
raise
end
rescue OpenSSL::SSL::SSLError => e
logger.error "SSL error for #{url_str}: #{e.message}"
record_failure
raise
end
end
private
def execute_request(url_str)
uri = URI.parse(url_str)
http = Net::HTTP.new(
uri.host, uri.port,
@proxy_host, @proxy_port,
@proxy_user, @proxy_pass
)
configure_ssl(http, uri)
http.open_timeout = 15
http.read_timeout = 30
response = http.request(Net::HTTP::Get.new(uri.request_uri))
record_success
parse_response(response)
end
def configure_ssl(http, uri)
return unless uri.scheme == 'https'
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_PEER
http.min_version = OpenSSL::SSL::TLS1_2_VERSION
http.ciphers = 'TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256'
end
def parse_response(response)
{ status: response.code.to_i, body: response.body }
end
def record_success
@mutex.synchronize do
@failure_count = 0
@circuit_opened_at = nil
end
end
def record_failure
@mutex.synchronize do
@failure_count += 1
if @failure_count >= CIRCUIT_FAILURE_THRESHOLD
@circuit_opened_at = Time.now
logger.error "Circuit breaker OPENED after #{@failure_count} failures"
end
end
end
def circuit_open?
@mutex.synchronize do
return false unless @circuit_opened_at
if Time.now - @circuit_opened_at > CIRCUIT_RESET_TIMEOUT
logger.info 'Circuit breaker HALF-OPEN - attempting reset'
@circuit_opened_at = nil
@failure_count = 0
false
else
true
end
end
end
end
# Utilizzo
client = ProxyHTTPClient.new(
proxy_host: 'gate.proxyhat.com',
proxy_port: 8080,
proxy_user: 'user-country-US',
proxy_pass: ENV['PROXYHAT_PASSWORD']
)
result = client.get('https://httpbin.org/ip')
puts "Response: #{result[:status]}"
Typhoeus: Richieste Parallele con libcurl
Typhoeus è un wrapper Ruby per libcurl che supporta richieste parallele tramite Hydra. È ideale quando devi fetchare centinaia di URL simultaneamente.
Installazione e Configurazione Base
# Gemfile
gem 'typhoeus', '~> 1.4'
require 'typhoeus'
# Configurazione proxy ProxyHat
PROXY_URL = 'http://user-country-US:your_password@gate.proxyhat.com:8080'
def fetch_typhoeus(url_str)
request = Typhoeus::Request.new(
url_str,
method: :get,
proxy: PROXY_URL,
followlocation: true,
timeout: 30,
connecttimeout: 15,
ssl_verifypeer: true,
ssl_verifyhost: 2,
headers: {
'User-Agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9'
}
)
response = request.run
{
status: response.code,
body: response.body,
headers: response.headers,
total_time: response.total_time,
effective_url: response.effective_url
}
end
result = fetch_typhoeus('https://httpbin.org/ip')
puts "Status: #{result[:status]}, Time: #{result[:total_time].round(3)}s"
Richieste Parallele con Hydra
Hydra permette di eseguire decine di richieste concorrenti con un singolo event loop:
require 'typhoeus'
class ParallelScraper
CONCURRENCY = 50
PROXY_URL = 'http://user-country-US:your_password@gate.proxyhat.com:8080'
def initialize(urls)
@urls = urls
@results = Queue.new
@hydra = Typhoeus::Hydra.new(max_concurrency: CONCURRENCY)
end
def scrape
start_time = Time.now
@urls.each_with_index do |url, index|
request = build_request(url, index)
@hydra.queue(request)
end
@hydra.run
elapsed = Time.now - start_time
results = drain_results
{
total_urls: @urls.size,
successful: results.count { |r| r[:status] == 200 },
failed: results.count { |r| r[:status] != 200 },
elapsed_seconds: elapsed.round(2),
requests_per_second: (results.size / elapsed).round(2),
results: results
}
end
private
def build_request(url, index)
request = Typhoeus::Request.new(
url,
method: :get,
proxy: PROXY_URL,
timeout: 30,
connecttimeout: 15,
followlocation: true,
headers: {
'User-Agent' => random_user_agent,
'Accept' => 'text/html,application/xhtml+xml'
}
)
request.on_complete do |response|
@results << {
index: index,
url: url,
status: response.code,
body: response.body,
time: response.total_time,
success: response.code == 200
}
end
request.on_failure do |response|
@results << {
index: index,
url: url,
status: response.code || 0,
error: response.status_message,
success: false
}
end
request
end
def drain_results
results = []
results << @results.pop until @results.empty?
results.sort_by { |r| r[:index] }
end
def random_user_agent
agents = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
]
agents.sample
end
end
# Genera lista URL di test
urls = 100.times.map { |i| "https://httpbin.org/delay/#{rand(1..3)}?id=#{i}" }
scraper = ParallelScraper.new(urls)
stats = scraper.scrape
puts "\n=== Statistiche ==="
puts "URL processati: #{stats[:total_urls]}"
puts "Successi: #{stats[:successful]}"
puts "Fallimenti: #{stats[:failed]}"
puts "Tempo totale: #{stats[:elapsed_seconds]}s"
puts "Richieste/sec: #{stats[:requests_per_second]}"
ProxyHat Ruby SDK: Rotazione IP e Geo-Targeting
Il ProxyHat SDK astrae la complessità della rotazione IP, consentendo geo-targeting granulare e sticky session senza modificare manualmente le credenziali.
Rotazione Automatica per Richiesta
require 'typhoeus'
class ProxyHatClient
BASE_URL = 'http://%s:%s@gate.proxyhat.com:8080'
def initialize(username:, password:, country: nil, city: nil, session: nil)
@username = build_username(username, country: country, city: city, session: session)
@password = password
@proxy_url = format(BASE_URL, @username, @password)
end
# Crea un nuovo client con sessione sticky (IP persistente)
def with_sticky_session(session_id = SecureRandom.hex(8))
self.class.new(
username: @username.split('-').first,
password: @password,
country: extract_country,
city: extract_city,
session: session_id
)
end
# Crea un nuovo client per paese diverso
def with_country(country_code)
self.class.new(
username: @username.split('-').first,
password: @password,
country: country_code,
city: nil,
session: nil
)
end
def get(url_str)
Typhoeus::Request.new(
url_str,
method: :get,
proxy: @proxy_url,
timeout: 30,
followlocation: true,
headers: default_headers
).run
end
def post(url_str, body:, content_type: 'application/json')
Typhoeus::Request.new(
url_str,
method: :post,
body: body.to_json,
proxy: @proxy_url,
timeout: 30,
headers: default_headers.merge('Content-Type' => content_type)
).run
end
private
def build_username(base, country:, city:, session:)
parts = [base]
parts << "country-#{country}" if country
parts << "city-#{city}" if city
parts << "session-#{session}" if session
parts.join('-')
end
def extract_country
@username[/country-([A-Z]{2})/, 1]
end
def extract_city
@username[/city-([a-z]+)/, 1]
end
def default_headers
{
'User-Agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9'
}
end
end
# Utilizzo: rotazione automatica (nuovo IP per richiesta)
client = ProxyHatClient.new(
username: 'user',
password: ENV['PROXYHAT_PASSWORD']
)
# Geo-targeting: richieste da IP tedeschi
de_client = client.with_country('DE')
response = de_client.get('https://httpbin.org/ip')
puts "IP tedesco: #{JSON.parse(response.body)['origin']}"
# Sticky session: stesso IP per richieste multiple
sticky_client = client.with_country('US').with_sticky_session('order-123')
5.times do |i|
resp = sticky_client.get('https://httpbin.org/ip')
ip = JSON.parse(resp.body)['origin']
puts "Richiesta #{i + 1}: #{ip}"
end
Esempio Reale: Scraping di 1000 URL con Proxy Residenziali
Questo esempio combina Typhoeus Hydra con ProxyHat per scraping massivo con rotazione IP automatica:
require 'typhoeus'
require 'json'
require 'logger'
require 'concurrent' # gem 'concurrent-ruby'
class MassScraper
BATCH_SIZE = 100
MAX_CONCURRENCY = 50
RETRYABLE_CODES = [429, 500, 502, 503, 504].freeze
def initialize(username:, password:, logger: nil)
@username = username
@password = password
@logger = logger || Logger.new($stdout, level: Logger::INFO)
@stats = Concurrent::Hash.new(0)
end
def scrape(urls, country: nil)
@stats.clear
start_time = Time.now
# Processa in batch per evitare memoria eccessiva
results = []
urls.each_slice(BATCH_SIZE).with_index do |batch, batch_num|
@logger.info "Processing batch #{batch_num + 1}/#{(urls.size.to_f / BATCH_SIZE).ceil}"
batch_results = process_batch(batch, country: country)
results.concat(batch_results)
end
elapsed = Time.now - start_time
print_summary(results.size, elapsed)
results
end
private
def process_batch(urls, country:)
hydra = Typhoeus::Hydra.new(max_concurrency: MAX_CONCURRENCY)
results_queue = Queue.new
urls.each_with_index do |url, idx|
# Ogni richiesta usa un proxy diverso (rotazione automatica)
proxy_url = build_proxy_url(country: country)
request = Typhoeus::Request.new(
url,
method: :get,
proxy: proxy_url,
timeout: 30,
connecttimeout: 15,
followlocation: true,
ssl_verifypeer: true,
headers: {
'User-Agent' => random_user_agent,
'Accept' => 'text/html,application/xhtml+xml'
}
)
request.on_complete do |response|
@stats[:total] += 1
if response.success?
@stats[:success] += 1
results_queue << { url: url, status: response.code, body: response.body, success: true }
elsif RETRYABLE_CODES.include?(response.code)
@stats[:retries] += 1
# Re-queue con delay
hydra.queue(build_retry_request(url, proxy_url, 1))
else
@stats[:failed] += 1
results_queue << { url: url, status: response.code, error: response.status_message, success: false }
end
end
hydra.queue(request)
end
hydra.run
# Drain results
results = []
results << results_queue.pop until results_queue.empty?
results
end
def build_retry_request(url, proxy_url, attempt)
request = Typhoeus::Request.new(
url,
method: :get,
proxy: proxy_url,
timeout: 30 + (attempt * 10),
connecttimeout: 15,
headers: { 'User-Agent' => random_user_agent }
)
request.on_complete do |response|
if response.success?
@stats[:success] += 1
else
@stats[:failed] += 1
end
end
request
end
def build_proxy_url(country: nil)
username = @username
username += "-country-#{country}" if country
"http://#{username}:#{@password}@gate.proxyhat.com:8080"
end
def random_user_agent
[
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Safari/605.1.15',
'Mozilla/5.0 (X11; Linux x86_64; rv:121.0) Gecko/20100101 Firefox/121.0'
].sample
end
def print_summary(total, elapsed)
@logger.info "\n" + '=' * 50
@logger.info "Scraping completato"
@logger.info "URL totali: #{total}"
@logger.info "Successi: #{@stats[:success]} (#{percent(@stats[:success], total)}%)"
@logger.info "Fallimenti: #{@stats[:failed]} (#{percent(@stats[:failed], total)}%)"
@logger.info "Retry: #{@stats[:retries]}"
@logger.info "Tempo: #{elapsed.round(2)}s"
@logger.info "Rate: #{(total / elapsed).round(2)} req/s"
@logger.info '=' * 50
end
def percent(value, total)
return 0 if total.zero?
((value.to_f / total) * 100).round(1)
end
end
# Esempio: scraping 1000 URL
urls = 1000.times.map { |i| "https://httpbin.org/get?id=#{i}" }
scraper = MassScraper.new(
username: 'user',
password: ENV['PROXYHAT_PASSWORD']
)
results = scraper.scrape(urls, country: 'US')
# Filtra risultati validi
valid_results = results.select { |r| r[:success] }
puts "Dati estratti da #{valid_results.size} pagine"
Configurazione TLS/SSL: Certificati Self-Signed e SNI
Quando usi proxy intermedi o test ambienti interni, potresti incontrare certificati self-signed o problemi SNI. Ecco come gestirli:
require 'net/http'
require 'openssl'
class TLSAwareProxyClient
def initialize(proxy_host:, proxy_port:, proxy_user:, proxy_pass:)
@proxy_host = proxy_host
@proxy_port = proxy_port
@proxy_user = proxy_user
@proxy_pass = proxy_pass
end
def get(url_str, verify_ssl: true, custom_ca: nil)
uri = URI.parse(url_str)
http = Net::HTTP.new(
uri.host, uri.port,
@proxy_host, @proxy_port,
@proxy_user, @proxy_pass
)
if uri.scheme == 'https'
http.use_ssl = true
configure_ssl(http, verify_ssl: verify_ssl, custom_ca: custom_ca)
end
http.open_timeout = 15
http.read_timeout = 30
# SNI (Server Name Indication) - critico per virtual hosting
# Net::HTTP imposta SNI automaticamente dall'hostname
# Per override manuale (raro):
# http.hostname = 'custom.sni.hostname'
request = Net::HTTP::Get.new(uri.request_uri)
http.request(request)
end
private
def configure_ssl(http, verify_ssl:, custom_ca:)
if verify_ssl
http.verify_mode = OpenSSL::SSL::VERIFY_PEER
# Usa CA bundle di sistema
http.ca_file = find_system_ca_bundle
# O specifica CA custom per self-signed
if custom_ca
http.ca_file = custom_ca
http.verify_mode = OpenSSL::SSL::VERIFY_PEER
end
# Forza TLS 1.2+
http.min_version = OpenSSL::SSL::TLS1_2_VERSION
http.max_version = OpenSSL::SSL::TLS1_3_VERSION
# Cipher suite sicure
http.ciphers = [
'TLS_AES_256_GCM_SHA384',
'TLS_CHACHA20_POLY1305_SHA256',
'TLS_AES_128_GCM_SHA256',
'ECDHE-RSA-AES256-GCM-SHA384',
'ECDHE-RSA-AES128-GCM-SHA256'
].join(':')
else
# SOLO per test/development - ignora errori SSL
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
$stderr.puts 'WARNING: SSL verification disabled!'
end
end
def find_system_ca_bundle
# Linux (Debian/Ubuntu)
return '/etc/ssl/certs/ca-certificates.crt' if File.exist?('/etc/ssl/certs/ca-certificates.crt')
# Linux (RHEL/CentOS)
return '/etc/pki/tls/certs/ca-bundle.crt' if File.exist?('/etc/pki/tls/certs/ca-bundle.crt')
# macOS
return '/etc/ssl/cert.pem' if File.exist?('/etc/ssl/cert.pem')
# Fallback
nil
end
end
# Utilizzo con verifica SSL standard
client = TLSAwareProxyClient.new(
proxy_host: 'gate.proxyhat.com',
proxy_port: 8080,
proxy_user: 'user-country-US',
proxy_pass: ENV['PROXYHAT_PASSWORD']
)
response = client.get('https://httpbin.org/get')
puts "Status: #{response.code}"
# Con CA custom per self-signed
# response = client.get('https://internal.local/api', custom_ca: '/path/to/ca.pem')
# Test senza verifica (NON usare in produzione!)
# response = client.get('https://self-signed.local/api', verify_ssl: false)
Integrazione Ruby on Rails: Faraday e ActiveJob
In applicazioni Rails, vuoi integrare proxy con middleware HTTP e job asincroni per evitare di bloccare il request cycle.
Faraday Middleware con Proxy
# Gemfile
gem 'faraday', '~> 2.0'
gem 'faraday-retry', '~> 2.0'
# config/initializers/proxy_http.rb
require 'faraday'
require 'faraday/retry'
class ProxyFaradayMiddleware
def initialize(app, proxy_host:, proxy_port:, proxy_user:, proxy_pass:, country: nil)
@app = app
@proxy_host = proxy_host
@proxy_port = proxy_port
@proxy_user = proxy_user
@proxy_pass = proxy_pass
@country = country
end
def call(env)
# Inietta URL proxy nella richiesta
proxy_url = build_proxy_url
env.request.proxy = Faraday::ProxyOptions.from(proxy_url)
@app.call(env)
end
private
def build_proxy_url
username = @proxy_user
username += "-country-#{@country}" if @country
"http://#{username}:#{@proxy_pass}@#{@proxy_host}:#{@proxy_port}"
end
end
# Factory per creare connessioni con proxy
module ProxyHTTP
class << self
def connection(country: nil, retry_options: {})
Faraday.new do |builder|
builder.use ProxyFaradayMiddleware,
proxy_host: 'gate.proxyhat.com',
proxy_port: 8080,
proxy_user: ENV['PROXYHAT_USERNAME'],
proxy_pass: ENV['PROXYHAT_PASSWORD'],
country: country
builder.request :retry, {
max: 3,
interval: 1.0,
backoff_factor: 2,
retry_statuses: [429, 500, 502, 503, 504],
methods: [:get, :head, :options],
retry_if: ->(env, _ex) { env.status == 429 }
}.merge(retry_options)
builder.response :json, content_type: /json/
builder.response :raise_error
builder.adapter :net_http do |http|
http.open_timeout = 15
http.read_timeout = 30
end
end
end
end
end
# Utilizzo nei controller o service objects
# app/services/serp_tracker_service.rb
class SerpTrackerService
def initialize(keyword, country: 'US')
@keyword = keyword
@country = country
@conn = ProxyHTTP.connection(country: country)
end
def fetch_results
response = @conn.get('https://serpproxy.example.com/search') do |req|
req.params['q'] = @keyword
req.params['gl'] = @country.downcase
end
parse_serp(response.body)
rescue Faraday::Error => e
Rails.logger.error "SERP fetch failed: #{e.message}"
{ error: e.message }
end
private
def parse_serp(data)
data.dig('organic_results') || []
end
end
ActiveJob per Scraping Asincrono
# app/jobs/scraping_job.rb
class ScrapingJob < ApplicationJob
queue_as :scraping
# Evita retry infiniti
retry_on Faraday::TimeoutError, wait: :polynomially_longer, attempts: 3
retry_on Faraday::ConnectionFailed, wait: 30.seconds, attempts: 2
discard_on Faraday::ResourceNotFound
def perform(url_list_id, country: 'US')
url_list = UrlList.find(url_list_id)
urls = url_list.urls
results = scrape_urls(urls, country: country)
store_results(url_list, results)
ensure
url_list.update!(completed_at: Time.current)
end
private
def scrape_urls(urls, country:)
conn = ProxyHTTP.connection(country: country)
# Processa in parallelo con Thread pool
pool_size = [urls.size, 20].min
mutex = Mutex.new
results = []
threads = urls.each_slice((urls.size.to_f / pool_size).ceil).map do |batch|
Thread.new do
batch.each do |url|
begin
response = conn.get(url)
mutex.synchronize do
results << { url: url, status: response.status, body: response.body }
end
rescue Faraday::Error => e
mutex.synchronize do
results << { url: url, error: e.message }
end
end
end
end
end
threads.each(&:join)
results
end
def store_results(url_list, results)
ScrapedResult.transaction do
results.each do |result|
next if result[:error]
ScrapedResult.create!(
url_list: url_list,
url: result[:url],
status_code: result[:status],
content: result[:body][0..65_535], # Limita dimensione
scraped_at: Time.current
)
end
end
end
end
# Chiamata dal controller
# app/controllers/scraping_controller.rb
class ScrapingController < ApplicationController
def create
url_list = UrlList.create!(urls: params[:urls].split('\n').map(&:strip))
ScrapingJob.perform_later(url_list.id, country: params[:country] || 'US')
redirect_to url_list, notice: 'Scraping avviato in background'
end
end
# config/sidekiq.yml (se usi Sidekiq)
# :concurrency: 10
# :queues:
# - scraping
# - default
# - mailers
Confronto: Net::HTTP vs Typhoeus vs ProxyHat SDK
| Caratteristica | Net::HTTP | Typhoeus | ProxyHat SDK |
|---|---|---|---|
| Dipendenze | Nessuna (stdlib) | libcurl | Wrapper su Typhoeus |
| Richieste parallele | No (usa Thread) | Sì (Hydra) | Sì (built-in) |
| Rotazione IP | Manuale | Manuale | Automatica |
| Geo-targeting | Manuale | Manuale | Integrato |
| Sticky session | Manuale | Manuale | Integrato |
| Performance | Media | Alta | Alta |
| Caso d'uso | Script semplici | Scraping massivo | Production-ready |
Punti Chiave
- Net::HTTP è sufficiente per script semplici, ma richiede codice boilerplate per proxy auth e gestione errori.
- Typhoeus eccelle per richieste parallele con Hydra — ideale per scraping ad alto volume.
- ProxyHat SDK astrae rotazione IP, geo-targeting e sticky session — la scelta migliore per production.
- Sempre configura timeout, retry con exponential backoff e circuit breaker per resilienza.
- In Rails, usa Faraday middleware e ActiveJob per operazioni asincrone non bloccanti.
- Verifica sempre TLS/SSL in produzione; disabilita solo per test interni.
Prossimi Passi
Per approfondire, consulta la pagina prezzi di ProxyHat per i piani residenziali, oppure esplora i proxy locations disponibili per il geo-targeting. Se stai costruendo un sistema SERP tracking, vedi il nostro use case dedicato.






