Proxies are essential for any serious web scraping or data pipeline in Ruby. Whether you're building a price monitoring system, SERP tracker, or large-scale data collector, you'll hit rate limits and IP blocks without proper proxy management. This guide covers everything from basic Net::HTTP proxy usage to production-ready concurrent scraping with rotating residential proxies.
Why You Need Proxies in Ruby Applications
Modern websites employ sophisticated anti-bot measures: rate limiting per IP, CAPTCHA challenges, geographic restrictions, and behavioral analysis. A well-configured proxy layer solves these problems by distributing requests across multiple IP addresses and geographic locations.
Ruby developers have several excellent options for proxy-aware HTTP clients. The standard library's Net::HTTP works for simple cases, while Typhoeus (libcurl-backed) excels at parallel requests. For production workloads, ProxyHat SDK handles IP rotation, geo-targeting, and session management automatically.
Net::HTTP: Built-in Proxy Support
Ruby's standard library includes everything you need for basic proxy usage. Net::HTTP supports HTTP proxies via its constructor, with authentication handled through proxy-user headers.
Basic Proxy Configuration
require 'net/http'
require 'uri'
# ProxyHat residential proxy configuration
PROXY_HOST = 'gate.proxyhat.com'
PROXY_PORT = 8080
PROXY_USER = 'your_username'
PROXY_PASS = 'your_password'
def fetch_with_proxy(url, country: nil, session: nil)
uri = URI.parse(url)
# Build proxy URL with geo-targeting in username
proxy_user = PROXY_USER.dup
proxy_user << "-country-#{country}" if country
proxy_user << "-session-#{session}" if session
# Create proxy-aware HTTP connection
proxy_uri = URI::HTTP.build(
host: PROXY_HOST,
port: PROXY_PORT,
userinfo: "#{proxy_user}:#{PROXY_PASS}"
)
http = Net::HTTP.new(uri.host, uri.port, proxy_uri.host, proxy_uri.port)
http.proxy_user = proxy_user
http.proxy_pass = PROXY_PASS
# TLS configuration
if uri.scheme == 'https'
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_PEER
http.ca_file = '/etc/ssl/certs/ca-certificates.crt' # Ubuntu/Debian
end
# Set timeouts
http.open_timeout = 10
http.read_timeout = 30
http.write_timeout = 10
request = Net::HTTP::Get.new(uri.request_uri)
request['User-Agent'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
request['Accept'] = 'text/html,application/xhtml+xml'
response = http.request(request)
{ status: response.code, headers: response.each_header.to_h, body: response.body }
rescue Net::OpenTimeout => e
{ error: 'Connection timeout', message: e.message }
rescue Net::ReadTimeout => e
{ error: 'Read timeout', message: e.message }
rescue SocketError => e
{ error: 'DNS resolution failed', message: e.message }
rescue OpenSSL::SSL::SSLError => e
{ error: 'SSL error', message: e.message }
rescue StandardError => e
{ error: 'Request failed', message: e.message }
end
# Usage examples
result = fetch_with_proxy('https://httpbin.org/ip')
puts result
# With geo-targeting (US IP)
result = fetch_with_proxy('https://httpbin.org/ip', country: 'US')
puts result
# Sticky session (same IP for multiple requests)
result = fetch_with_proxy('https://httpbin.org/ip', session: 'my-session-123')
Handling Redirects and Cookies
require 'net/http'
require 'uri'
class ProxyHTTPClient
attr_reader :cookies
def initialize(proxy_host:, proxy_port:, proxy_user:, proxy_pass:)
@proxy_host = proxy_host
@proxy_port = proxy_port
@proxy_user = proxy_user
@proxy_pass = proxy_pass
@cookies = {}
end
def get(url, redirect_limit: 5)
raise 'Too many redirects' if redirect_limit.zero?
uri = URI.parse(url)
http = build_http(uri)
request = Net::HTTP::Get.new(uri.request_uri)
add_default_headers(request, uri)
add_cookie_header(request)
response = http.request(request)
extract_cookies(response)
case response
when Net::HTTPRedirection
redirect_url = response['Location']
redirect_url = URI.join(url, redirect_url).to_s unless redirect_url =~ /^http/
get(redirect_url, redirect_limit: redirect_limit - 1)
when Net::HTTPSuccess
response.body
else
raise "HTTP #{response.code}: #{response.message}"
end
end
private
def build_http(uri)
http = Net::HTTP.new(uri.host, uri.port, @proxy_host, @proxy_port)
http.proxy_user = @proxy_user
http.proxy_pass = @proxy_pass
if uri.scheme == 'https'
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_PEER
end
http.open_timeout = 10
http.read_timeout = 30
http
end
def add_default_headers(request, uri)
request['User-Agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)'
request['Accept'] = 'text/html,application/xhtml+xml,application/xml;q=0.9'
request['Accept-Language'] = 'en-US,en;q=0.9'
request['Accept-Encoding'] = 'gzip, deflate'
end
def add_cookie_header(request)
return if @cookies.empty?
request['Cookie'] = @cookies.map { |k, v| "#{k}=#{v}" }.join('; ')
end
def extract_cookies(response)
response.get_fields('Set-Cookie')&.each do |cookie|
name, value = cookie.split(';').first.split('=', 2)
@cookies[name.strip] = value.strip if name && value
end
end
end
# Usage
client = ProxyHTTPClient.new(
proxy_host: 'gate.proxyhat.com',
proxy_port: 8080,
proxy_user: 'user-country-US',
proxy_pass: 'your_password'
)
html = client.get('https://example.com/login')
puts "Session cookies: #{client.cookies}"
Typhoeus: Parallel Requests with libcurl
Typhoeus wraps libcurl, providing superior performance for concurrent requests. Its Hydra interface lets you fire hundreds of requests in parallel while respecting connection limits.
Single Request with Proxy
require 'typhoeus'
# Configure proxy globally or per-request
Typhoeus.configure do |config|
config.proxy = 'http://user-country-US:pass@gate.proxyhat.com:8080'
config.timeout = 30
config.connecttimeout = 10
config.followlocation = true
config.maxredirs = 5
end
# Single request with proxy
response = Typhoeus.get(
'https://httpbin.org/ip',
proxy: 'http://user-country-DE:pass@gate.proxyhat.com:8080',
headers: {
'User-Agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)',
'Accept' => 'text/html,application/xhtml+xml'
},
ssl_verifypeer: true,
ssl_verifyhost: 2,
sslversion: :tlsv1_2
)
puts "Status: #{response.code}"
puts "Body: #{response.body}"
puts "Time: #{response.total_time}s"
# Error handling
unless response.success?
puts "Request failed: #{response.return_message}"
puts "HTTP code: #{response.code}"
puts "Curl error: #{response.curl_error_message}" if response.curl_error?
end
Parallel Requests with Hydra
require 'typhoeus'
class ParallelScraper
PROXY_GATEWAY = 'gate.proxyhat.com'
PROXY_PORT = 8080
def initialize(username:, password:, concurrency: 50)
@username = username
@password = password
@concurrency = concurrency
@results = Queue.new
end
def fetch_all(urls, country: nil)
hydra = Typhoeus::Hydra.new(max_concurrency: @concurrency)
urls.each_with_index do |url, index|
request = build_request(url, country: country, session: "batch-#{index % 100}")
request.on_complete do |response|
@results << { url: url, status: response.code, body: response.body }
end
hydra.queue(request)
end
# Run all requests
hydra.run
collect_results(urls.size)
end
private
def build_request(url, country: nil, session: nil)
proxy_user = @username.dup
proxy_user << "-country-#{country}" if country
proxy_user << "-session-#{session}" if session
Typhoeus::Request.new(
url,
method: :get,
proxy: "http://#{proxy_user}:#{@password}@#{PROXY_GATEWAY}:#{PROXY_PORT}",
headers: {
'User-Agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9',
'Accept-Language' => 'en-US,en;q=0.9',
'Accept-Encoding' => 'gzip, deflate'
},
timeout: 30,
connecttimeout: 10,
followlocation: true,
maxredirs: 5,
ssl_verifypeer: true,
ssl_verifyhost: 2
)
end
def collect_results(expected_count)
results = []
expected_count.times { results << @results.pop }
results
end
end
# Usage: Fetch 100 URLs concurrently
scraper = ParallelScraper.new(
username: 'your_username',
password: 'your_password',
concurrency: 25 # Limit concurrent connections
)
urls = 100.times.map { |i| "https://httpbin.org/delay/#{rand(1..3)}" }
results = scraper.fetch_all(urls, country: 'US')
# Process results
successful = results.select { |r| r[:status] == 200 }
puts "Success rate: #{successful.size}/#{results.size}"
# Calculate throughput
total_time = results.map { |r| r[:body].length.to_f }.sum
puts "Data transferred: #{(total_time / 1024).round(2)} KB"
ProxyHat Ruby SDK: Production-Ready Rotation
For production scraping, you need automatic IP rotation, retry logic, and geo-targeting. The ProxyHat Ruby SDK wraps these concerns in a clean API.
require 'net/http'
require 'uri'
require 'json'
require 'digest'
# ProxyHat Ruby SDK
module ProxyHat
class Client
DEFAULT_GATEWAY = 'gate.proxyhat.com'
HTTP_PORT = 8080
SOCKS_PORT = 1080
MAX_RETRIES = 3
RETRY_DELAY = 2
attr_reader :config
def initialize(username:, password:, gateway: DEFAULT_GATEWAY, port: HTTP_PORT)
@config = {
username: username,
password: password,
gateway: gateway,
port: port
}
@session_pool = {}
end
# Fetch a URL with automatic rotation and retries
def get(url, country: nil, city: nil, session: nil, sticky: false)
retries = 0
loop do
result = perform_request(url, country: country, city: city, session: session)
return result if result[:status] && result[:status] < 400
if should_retry?(result[:status]) && retries < MAX_RETRIES
retries += 1
sleep(RETRY_DELAY * retries)
session = nil unless sticky # Rotate IP on retry
next
end
return result
end
end
# Batch fetch with automatic session management
def batch_fetch(urls, country: nil, concurrency: 10)
require 'thread'
results = {}
mutex = Mutex.new
threads = []
urls.each_slice(concurrency) do |batch|
threads << Thread.new do
batch.each do |url|
session_id = Digest::MD5.hexdigest(url)[0..8]
result = get(url, country: country, session: session_id)
mutex.synchronize { results[url] = result }
end
end
end
threads.each(&:join)
results
end
# Get a fresh proxy URL for custom clients
def proxy_url(country: nil, city: nil, session: nil)
user = build_username(country: country, city: city, session: session)
"http://#{user}:#{@config[:password]}@#{@config[:gateway]}:#{@config[:port]}"
end
private
def perform_request(url, country: nil, city: nil, session: nil)
uri = URI.parse(url)
user = build_username(country: country, city: city, session: session)
http = Net::HTTP.new(
uri.host, uri.port,
@config[:gateway], @config[:port]
)
http.proxy_user = user
http.proxy_pass = @config[:password]
configure_ssl(http, uri)
set_timeouts(http)
request = Net::HTTP::Get.new(uri.request_uri)
add_headers(request)
response = http.request(request)
{
status: response.code.to_i,
headers: response.each_header.to_h,
body: response.body,
proxy_ip: extract_proxy_ip(response)
}
rescue StandardError => e
{ error: e.class.name, message: e.message, status: 0 }
end
def build_username(country: nil, city: nil, session: nil)
parts = [@config[:username]]
parts << "country-#{country}" if country
parts << "city-#{city}" if city
parts << "session-#{session}" if session
parts.join('-')
end
def configure_ssl(http, uri)
return unless uri.scheme == 'https'
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_PEER
http.ssl_version = :TLSv1_2
end
def set_timeouts(http)
http.open_timeout = 15
http.read_timeout = 45
http.write_timeout = 15
end
def add_headers(request)
request['User-Agent'] = random_user_agent
request['Accept'] = 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'
request['Accept-Language'] = 'en-US,en;q=0.9'
request['Accept-Encoding'] = 'gzip, deflate'
request['Connection'] = 'keep-alive'
end
def random_user_agent
agents = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Safari/605.1.15',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0',
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
]
agents.sample
end
def should_retry?(status)
[429, 500, 502, 503, 504].include?(status)
end
def extract_proxy_ip(response)
# Some proxy providers return the exit IP in headers
response['X-Proxy-IP'] || response['Via']&.split&.first
end
end
end
# Usage
client = ProxyHat::Client.new(
username: 'your_username',
password: 'your_password'
)
# Single request with US geo-targeting
result = client.get('https://httpbin.org/ip', country: 'US')
puts "Exit IP: #{result[:body]}"
# City-level targeting
result = client.get('https://httpbin.org/ip', country: 'US', city: 'new_york')
puts "New York IP: #{result[:body]}"
# Batch processing
urls = 50.times.map { |i| "https://httpbin.org/uuid" }
results = client.batch_fetch(urls, country: 'DE', concurrency: 10)
puts "Fetched #{results.size} URLs"
# Get proxy URL for use with other libraries
proxy = client.proxy_url(country: 'JP', session: 'sticky-session-1')
puts "Proxy URL: #{proxy}"
Real-World Example: Scraping 1000 URLs Concurrently
Here's a complete production example that fetches 1000 URLs using rotating residential proxies with proper error handling, logging, and result aggregation.
require 'typhoeus'
require 'json'
require 'logger'
require 'csv'
class ProductionScraper
PROXY_GATEWAY = 'gate.proxyhat.com'
PROXY_PORT = 8080
def initialize(username:, password:, concurrency: 50, output_file: 'results.csv')
@username = username
@password = password
@concurrency = concurrency
@output_file = output_file
@logger = Logger.new(STDOUT)
@logger.level = Logger::INFO
@stats = { success: 0, failed: 0, retries: 0 }
@stats_mutex = Mutex.new
end
def run(urls, country: 'US')
@logger.info "Starting scrape of #{urls.size} URLs with #{@concurrency} concurrency"
start_time = Time.now
# Write CSV header
CSV.open(@output_file, 'w') do |csv|
csv << ['url', 'status', 'response_time', 'ip_address', 'error']
end
# Process in batches
hydra = Typhoeus::Hydra.new(max_concurrency: @concurrency)
semaphore = Mutex.new
pending_writes = []
urls.each_with_index do |url, idx|
# Rotate session every 50 requests for IP diversity
session_id = "batch-#{idx / 50}"
country_code = rotate_country(idx) if country == 'ROTATE'
request = build_request(url, country: country_code || country, session: session_id)
request.on_complete do |response|
result = process_response(url, response)
semaphore.synchronize { pending_writes << result }
# Flush to CSV every 100 results
if pending_writes.size >= 100
flush_results(pending_writes)
pending_writes.clear
end
end
hydra.queue(request)
end
# Run all requests
hydra.run
# Flush remaining results
flush_results(pending_writes)
# Report stats
elapsed = Time.now - start_time
@logger.info "Completed in #{elapsed.round(2)}s"
@logger.info "Stats: #{@stats}"
@logger.info "Requests/sec: #{(urls.size / elapsed).round(2)}"
@logger.info "Results saved to #{@output_file}"
end
private
def build_request(url, country:, session:)
proxy_user = "#{@username}-country-#{country}-session-#{session}"
Typhoeus::Request.new(
url,
method: :get,
proxy: "http://#{proxy_user}:#{@password}@#{PROXY_GATEWAY}:#{PROXY_PORT}",
headers: {
'User-Agent' => random_user_agent,
'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language' => 'en-US,en;q=0.9',
'Accept-Encoding' => 'gzip, deflate',
'Cache-Control' => 'no-cache'
},
timeout: 30,
connecttimeout: 15,
followlocation: true,
maxredirs: 3,
ssl_verifypeer: true,
ssl_verifyhost: 2,
accept_encoding: 'gzip'
)
end
def process_response(url, response)
result = {
url: url,
status: response.code,
response_time: response.total_time,
ip_address: extract_ip(response.body),
error: nil
}
if response.success?
update_stats(:success)
else
result[:error] = response.return_message || "HTTP #{response.code}"
update_stats(:failed)
end
result
end
def flush_results(results)
CSV.open(@output_file, 'a') do |csv|
results.each do |r|
csv << [r[:url], r[:status], r[:response_time], r[:ip_address], r[:error]]
end
end
@logger.info "Flushed #{results.size} results to CSV"
end
def update_stats(type)
@stats_mutex.synchronize { @stats[type] += 1 }
end
def random_user_agent
[
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0',
'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:121.0) Gecko/20100101 Firefox/121.0'
].sample
end
def rotate_country(idx)
countries = %w[US DE GB FR CA AU NL ES IT]
countries[idx % countries.size]
end
def extract_ip(body)
# Extract IP from httpbin.org/ip response
JSON.parse(body)['origin'] rescue nil
end
end
# Run the scraper
if __FILE__ == $0
# Generate 1000 test URLs
urls = 1000.times.map do |i|
"https://httpbin.org/delay/#{rand(1..2)}"
end
scraper = ProductionScraper.new(
username: 'your_username',
password: 'your_password',
concurrency: 50,
output_file: 'scrape_results.csv'
)
scraper.run(urls, country: 'US')
end
TLS/SSL Configuration: Handling Edge Cases
Production scrapers encounter various SSL/TLS configurations. Here's how to handle common edge cases while maintaining security.
require 'net/http'
require 'openssl'
module SSLConfigurable
# Custom certificate store with additional CAs
def self.create_ca_store
store = OpenSSL::X509::Store.new
store.set_default_paths # Load system CAs
# Add custom certificates if needed
# store.add_file('/path/to/custom-ca.pem')
store
end
# HTTP client with strict SSL verification
def self.strict_ssl_client(uri, proxy_config)
http = Net::HTTP.new(uri.host, uri.port, proxy_config[:host], proxy_config[:port])
http.proxy_user = proxy_config[:user]
http.proxy_pass = proxy_config[:pass]
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_PEER
http.cert_store = create_ca_store
http.ssl_version = :TLSv1_2
# Enable SNI (Server Name Indication) - required for many CDNs
http.enable_post_connection_check = true
http
end
# HTTP client for self-signed/upstream certificates
# WARNING: Only use for internal testing or when you control the endpoint
def self.permissive_ssl_client(uri, proxy_config)
http = Net::HTTP.new(uri.host, uri.port, proxy_config[:host], proxy_config[:port])
http.proxy_user = proxy_config[:user]
http.proxy_pass = proxy_config[:pass]
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE # Skip verification
http.ssl_version = :TLSv1_2
http
end
# HTTP client with custom cipher suite
def self.custom_cipher_client(uri, proxy_config)
http = Net::HTTP.new(uri.host, uri.port, proxy_config[:host], proxy_config[:port])
http.proxy_user = proxy_config[:user]
http.proxy_pass = proxy_config[:pass]
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_PEER
http.cert_store = create_ca_store
# Configure cipher suites for legacy servers
http.ciphers = 'ALL:!aNULL:!eNULL:!SSLv2:!SSLv3'
http
end
end
# Example: Handling sites with different SSL configurations
def fetch_with_ssl_fallback(url, proxy_config)
uri = URI.parse(url)
# Try strict SSL first
begin
http = SSLConfigurable.strict_ssl_client(uri, proxy_config)
request = Net::HTTP::Get.new(uri.request_uri)
response = http.request(request)
return { success: true, body: response.body, ssl_mode: 'strict' }
rescue OpenSSL::SSL::SSLError => e
puts "Strict SSL failed: #{e.message}"
end
# Fallback to permissive for testing only
begin
http = SSLConfigurable.permissive_ssl_client(uri, proxy_config)
request = Net::HTTP::Get.new(uri.request_uri)
response = http.request(request)
return { success: true, body: response.body, ssl_mode: 'permissive' }
rescue StandardError => e
return { success: false, error: e.message }
end
end
# Usage
proxy_config = {
host: 'gate.proxyhat.com',
port: 8080,
user: 'user-country-US',
pass: 'your_password'
}
result = fetch_with_ssl_fallback('https://expired.badssl.com/', proxy_config)
puts "SSL mode used: #{result[:ssl_mode]}"
SNI and Certificate Verification
Server Name Indication (SNI) is essential for virtual hosting and CDNs. Ruby's Net::HTTP enables SNI by default when you set enable_post_connection_check = true. This verifies the certificate's CN/SAN matches the hostname you connected to.
require 'net/http'
def fetch_with_sni_check(url, proxy_config)
uri = URI.parse(url)
http = Net::HTTP.new(uri.host, uri.port, proxy_config[:host], proxy_config[:port])
http.proxy_user = proxy_config[:user]
http.proxy_pass = proxy_config[:pass]
if uri.scheme == 'https'
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_PEER
http.enable_post_connection_check = true # Enable SNI verification
http.ssl_version = :TLSv1_2
end
http.open_timeout = 10
http.read_timeout = 30
request = Net::HTTP::Get.new(uri.request_uri)
request['Host'] = uri.host # Explicit Host header (usually automatic)
response = http.request(request)
response.body
rescue OpenSSL::SSL::SSLError => e
# Certificate mismatch, expired, or untrusted
{ error: 'ssl_verification_failed', details: e.message }
rescue StandardError => e
{ error: 'request_failed', details: e.message }
end
Ruby on Rails Integration
Integrating proxy-aware HTTP clients into Rails applications requires careful consideration of configuration, background jobs, and middleware patterns.
Faraday Middleware for Proxies
# config/initializers/proxy_client.rb
require 'faraday'
require 'faraday/retry'
module ProxyClient
class ProxyMiddleware < Faraday::Middleware
def initialize(app, proxy_config)
super(app)
@proxy_config = proxy_config
end
def call(env)
# Rotate session for each request
session = SecureRandom.hex(8)
country = env[:request_context]&.dig(:country) || 'US'
proxy_user = "#{@proxy_config[:username]}-country-#{country}-session-#{session}"
env.request.proxy = URI.parse(
"http://#{proxy_user}:#{@proxy_config[:password]}@#{@proxy_config[:gateway]}:#{@proxy_config[:port]}"
)
@app.call(env)
end
end
class << self
def build(options = {})
Faraday.new do |conn|
conn.use ProxyMiddleware, {
username: ENV['PROXYHAT_USER'],
password: ENV['PROXYHAT_PASS'],
gateway: 'gate.proxyhat.com',
port: 8080
}
conn.request :retry, {
max: 3,
interval: 2,
backoff_factor: 2,
retry_statuses: [429, 500, 502, 503, 504]
}
conn.request :json
conn.response :json, content_type: /\bjson\Z/
conn.response :raise_error
conn.adapter :net_http_persistent do |http|
http.open_timeout = 15
http.read_timeout = 45
http.write_timeout = 15
end
end
end
end
end
# Global client instance
PROXY_HTTP = ProxyClient.build
ActiveJob Integration for Background Scraping
# app/jobs/scraping_job.rb
require 'net/http'
class ScrapingJob < ApplicationJob
queue_as :scraping
# Retry on transient failures
retry_on Net::ReadTimeout, wait: :exponentially_longer, attempts: 3
retry_on Net::OpenTimeout, wait: :exponentially_longer, attempts: 3
discard_on ActiveJob::DeserializationError
def perform(url, options = {})
@url = url
@country = options['country'] || 'US'
@session = options['session'] || SecureRandom.hex(8)
result = fetch_with_proxy
if result[:success]
process_result(result[:body])
else
handle_failure(result[:error])
end
end
private
def fetch_with_proxy
uri = URI.parse(@url)
proxy_user = "#{ENV['PROXYHAT_USER']}-country-#{@country}-session-#{@session}"
http = Net::HTTP.new(uri.host, uri.port, 'gate.proxyhat.com', 8080)
http.proxy_user = proxy_user
http.proxy_pass = ENV['PROXYHAT_PASS']
configure_ssl(http, uri)
set_timeouts(http)
request = Net::HTTP::Get.new(uri.request_uri)
add_headers(request)
response = http.request(request)
{ success: response.code.to_i < 400, body: response.body, status: response.code }
rescue StandardError => e
Rails.logger.error "ScrapingJob failed for #{@url}: #{e.message}"
{ success: false, error: e.message }
end
def configure_ssl(http, uri)
return unless uri.scheme == 'https'
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_PEER
http.ssl_version = :TLSv1_2
end
def set_timeouts(http)
http.open_timeout = 15
http.read_timeout = 45
end
def add_headers(request)
request['User-Agent'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
request['Accept'] = 'text/html,application/xhtml+xml'
request['Accept-Language'] = 'en-US,en;q=0.9'
request['Accept-Encoding'] = 'gzip, deflate'
end
def process_result(body)
# Parse and store the scraped data
data = parse_response(body)
ScrapedData.create!(
url: @url,
data: data,
scraped_at: Time.current,
country: @country
)
end
def handle_failure(error)
ScrapingFailure.create!(
url: @url,
error: error,
country: @country,
failed_at: Time.current
)
# Re-raise to trigger retry mechanism
raise error if attempts < 3
end
end
# app/jobs/batch_scraping_job.rb
class BatchScrapingJob < ApplicationJob
queue_as :batch_scraping
def perform(urls, options = {})
country = options['country'] || 'US'
batch_size = options['batch_size'] || 50
urls.each_slice(batch_size).with_index do |batch, idx|
batch.each do |url|
ScrapingJob.perform_later(url, {
'country' => country,
'session' => "batch-#{idx}"
})
end
# Rate limit between batches
sleep(options['delay'] || 2)
end
end
end
# Usage in controller or service
BatchScrapingJob.perform_later(
['https://example.com/page/1', 'https://example.com/page/2'],
{ 'country' => 'US', 'batch_size' => 25 }
)
Configuration Best Practices for Rails
# config/initializers/proxyhat.rb
PROXYHAT_CONFIG = {
username: ENV.fetch('PROXYHAT_USER'),
password: ENV.fetch('PROXYHAT_PASS'),
gateway: ENV.fetch('PROXYHAT_GATEWAY', 'gate.proxyhat.com'),
http_port: ENV.fetch('PROXYHAT_HTTP_PORT', 8080).to_i,
socks_port: ENV.fetch('PROXYHAT_SOCKS_PORT', 1080).to_i,
timeout: ENV.fetch('PROXYHAT_TIMEOUT', 30).to_i,
max_retries: ENV.fetch('PROXYHAT_MAX_RETRIES', 3).to_i
}.freeze
# app/services/proxy_service.rb
class ProxyService
include Singleton
def fetch(url, country: nil, session: nil)
client.get(url, country: country, session: session)
end
def batch_fetch(urls, country: nil)
client.batch_fetch(urls, country: country)
end
private
def client
@client ||= ProxyHat::Client.new(
username: PROXYHAT_CONFIG[:username],
password: PROXYHAT_CONFIG[:password],
gateway: PROXYHAT_CONFIG[:gateway],
port: PROXYHAT_CONFIG[:http_port]
)
end
end
# Usage throughout the app
result = ProxyService.instance.fetch('https://api.example.com/data', country: 'US')
Comparison: Ruby HTTP Clients for Proxies
| Feature | Net::HTTP | Typhoeus | ProxyHat SDK |
|---|---|---|---|
| Stdlib (no deps) | ✓ | ✗ | ✗ |
| Parallel requests | ✗ | ✓ (Hydra) | ✓ (batch) |
| Auto IP rotation | ✗ | ✗ | ✓ |
| Geo-targeting | Manual | Manual | ✓ |
| Retry logic | Manual | Basic | ✓ |
| Connection pooling | ✗ | ✓ | ✓ |
| SSL customization | ✓ | ✓ | ✓ |
| Best for | Simple scripts | High throughput | Production scraping |
Key Takeaways
- Net::HTTP is sufficient for simple proxy needs and has zero dependencies, but requires manual handling of retries, rotation, and parallel execution.
- Typhoeus excels at concurrent requests with its Hydra interface, making it ideal for high-throughput scraping with controlled concurrency.
- ProxyHat SDK handles production concerns automatically: IP rotation, geo-targeting, retries, and session management.
- Always configure proper timeouts:
open_timeout,read_timeout, andwrite_timeoutprevent hung connections. - Use sticky sessions when you need the same IP for multiple requests (login flows, multi-page navigation).
- In Rails, use ActiveJob for background scraping and Faraday middleware for consistent proxy configuration across your app.
- Test SSL configurations against various server setups; some sites require specific cipher suites or SNI handling.
Conclusion
Ruby provides excellent tools for proxy-aware HTTP clients, from the standard library's Net::HTTP to high-performance options like Typhoeus. For production scraping workloads, combining these with ProxyHat's rotating residential proxies gives you the reliability and scale you need.
Start with Net::HTTP for simple use cases, graduate to Typhoeus when you need concurrency, and use the ProxyHat SDK for production scraping that requires automatic rotation and geo-targeting. The patterns in this guide will help you build robust scraping pipelines that handle rate limits, IP blocks, and transient failures gracefully.
Ready to scale your Ruby scraping? Get started with ProxyHat and access millions of residential IPs worldwide.






