Ruby HTTP Proxy Guide: Net::HTTP, Typhoeus und ProxyHat SDK im Vergleich

Vollständiger Leitfaden für HTTP-Proxies in Ruby: Net::HTTP, Typhoeus mit parallelen Anfragen, ProxyHat SDK mit Rotation und Geo-Targeting – mit produktionsreifen Codebeispielen.

Ruby HTTP Proxy Guide: Net::HTTP, Typhoeus und ProxyHat SDK im Vergleich

Wer in Ruby Web-Scraper, API-Clients oder Datenpipelines baut, stößt unweigerlich auf Proxies. Sei es um IP-Rate-Limits zu umgehen, geografische Beschränkungen zu überwinden oder schlicht die eigene IP nicht preiszugeben. Doch die Proxy-Integration in Ruby ist nicht immer trivial – besonders wenn es um Authentifizierung, parallele Anfragen oder TLS-Konfiguration geht.

Dieser Guide zeigt drei Wege: die Standardbibliothek Net::HTTP, die libcurl-basierte Typhoeus-Bibliothek für parallele Requests, und das ProxyHat Ruby SDK für automatische IP-Rotation. Mit produktionsreifem Code.

Net::HTTP mit Proxy: Der Standardweg

Net::HTTP ist Teil der Ruby-Standardbibliothek und benötigt keine zusätzlichen Gems. Für einfache Proxy-Anforderungen reicht das völlig aus. Die Proxy-Authentifizierung erfolgt über explizite Parameter.

require 'net/http'
require 'uri'

# ProxyHat-Verbindungsdaten
PROXY_HOST = 'gate.proxyhat.com'
PROXY_PORT = 8080
PROXY_USER = 'your_username'
PROXY_PASS = 'your_password'

def fetch_with_proxy(url, proxy_user: nil, proxy_pass: nil)
  uri = URI.parse(url)
  
  # Proxy-Verbindung erstellen
  proxy = Net::HTTP::Proxy(PROXY_HOST, PROXY_PORT, proxy_user, proxy_pass)
  http = proxy.new(uri.host, uri.port)
  
  # TLS/SSL-Konfiguration
  http.use_ssl = (uri.scheme == 'https')
  http.verify_mode = OpenSSL::SSL::VERIFY_PEER
  http.open_timeout = 15
  http.read_timeout = 30
  
  request = Net::HTTP::Get.new(uri.request_uri)
  request['User-Agent'] = 'Mozilla/5.0 (compatible; RubyScraper/1.0)'
  request['Accept'] = 'text/html,application/xhtml+xml'
  
  response = http.request(request)
  
  case response
  when Net::HTTPSuccess
    { success: true, status: response.code, body: response.body }
  when Net::HTTPRedirection
    { success: false, status: response.code, redirect_to: response['location'] }
  when Net::HTTPTooManyRequests
    { success: false, status: response.code, retry_after: response['retry-after'] }
  else
    { success: false, status: response.code, message: response.message }
  end
rescue Net::OpenTimeout => e
  { success: false, error: 'connection_timeout', message: e.message }
rescue Net::ReadTimeout => e
  { success: false, error: 'read_timeout', message: e.message }
rescue OpenSSL::SSL::SSLError => e
  { success: false, error: 'ssl_error', message: e.message }
rescue SocketError => e
  { success: false, error: 'dns_error', message: e.message }
ensure
  http&.finish if http&.started?
end

# Beispielaufruf mit Geo-Targeting (Username enthält Ländercode)
result = fetch_with_proxy(
  'https://httpbin.org/ip',
  proxy_user: "#{PROXY_USER}-country-US",
  proxy_pass: PROXY_PASS
)

puts "Status: #{result[:status]}"
puts "Body: #{result[:body][0..200]}..." if result[:success]

Der Username-Parameter bei ProxyHat unterstützt verschiedene Flags:

  • user-country-US – US-amerikanische Exit-IP
  • user-country-DE-city-berlin – Berliner Exit-IP
  • user-session-abc123 – Sticky Session mit fester IP

Wiederholungslogik mit Exponential Backoff

Für produktive Scraper ist ein Retry-Mechanismus essenziell. Hier eine robuste Implementierung:

require 'net/http'
require 'uri'

module Scraper
  class ProxyClient
    MAX_RETRIES = 3
    BASE_DELAY = 1.0
    
    attr_reader :proxy_host, :proxy_port, :proxy_user, :proxy_pass
    
    def initialize(proxy_host:, proxy_port:, proxy_user:, proxy_pass:)
      @proxy_host = proxy_host
      @proxy_port = proxy_port
      @proxy_user = proxy_user
      @proxy_pass = proxy_pass
    end
    
    def get(url, headers: {})
      retries = 0
      
      loop do
        result = perform_request(url, headers)
        return result if result[:success]
        
        if should_retry?(result) && retries < MAX_RETRIES
          retries += 1
          delay = BASE_DELAY * (2 ** retries) + rand(0.5)
          puts "Retry #{retries}/#{MAX_RETRIES} after #{delay.round(2)}s"
          sleep(delay)
        else
          return result
        end
      end
    end
    
    private
    
    def perform_request(url, headers)
      uri = URI.parse(url)
      proxy_class = Net::HTTP::Proxy(proxy_host, proxy_port, proxy_user, proxy_pass)
      
      http = proxy_class.new(uri.host, uri.port)
      http.use_ssl = (uri.scheme == 'https')
      http.verify_mode = OpenSSL::SSL::VERIFY_PEER
      http.open_timeout = 10
      http.read_timeout = 25
      
      request = Net::HTTP::Get.new(uri.request_uri)
      headers.each { |k, v| request[k] = v }
      
      response = http.request(request)
      
      if response.is_a?(Net::HTTPSuccess)
        { success: true, status: response.code.to_i, body: response.body, headers: response.each_header.to_h }
      else
        { success: false, status: response.code.to_i, error: 'http_error' }
      end
    rescue Net::OpenTimeout, Net::ReadTimeout, Errno::ECONNREFUSED => e
      { success: false, error: 'connection_error', message: e.message }
    rescue => e
      { success: false, error: 'unknown', message: e.message }
    ensure
      http&.finish if http&.started?
    end
    
    def should_retry?(result)
      %w[connection_error http_error].include?(result[:error]) ||
        [429, 502, 503, 504].include?(result[:status])
    end
  end
end

# Verwendung
client = Scraper::ProxyClient.new(
  proxy_host: 'gate.proxyhat.com',
  proxy_port: 8080,
  proxy_user: 'user-country-DE',
  proxy_pass: 'your_password'
)

result = client.get('https://example.com/api/data', headers: {
  'User-Agent' => 'MyApp/1.0',
  'Accept' => 'application/json'
})

puts result.inspect

Typhoeus: Parallele Requests mit Hydra

Typhoeus nutzt libcurl unter der Haube und ermöglicht parallele HTTP-Anfragen via Hydra. Das ist ideal für Scraping-Szenarien mit Hunderten gleichzeitigen Requests.

require 'typhoeus'

# Proxy-Konfiguration
PROXY_CONFIG = {
  proxy: 'http://gate.proxyhat.com:8080',
  proxyuserpwd: 'your_username:your_password',
  proxyauth: :basic
}

def fetch_single(url)
  response = Typhoeus.get(url, {
    **PROXY_CONFIG,
    headers: {
      'User-Agent' => 'Mozilla/5.0 (compatible; Typhoeus/1.0)',
      'Accept' => 'text/html'
    },
    timeout: 30,
    followlocation: true,
    ssl_verifypeer: true,
    ssl_verifyhost: 2
  })
  
  if response.success?
    { success: true, status: response.code, body: response.body }
  elsif response.timed_out?
    { success: false, error: 'timeout' }
  else
    { success: false, status: response.code, error: response.return_message }
  end
end

# Parallele Anfragen mit Hydra
def fetch_parallel(urls, concurrency: 50)
  results = Concurrent::Hash.new
  hydra = Typhoeus::Hydra.new(max_concurrency: concurrency)
  
  urls.each_with_index do |url, index|
    request = Typhoeus::Request.new(url, {
      **PROXY_CONFIG,
      headers: { 'User-Agent' => 'Typhoeus/1.0' },
      timeout: 25,
      followlocation: true
    })
    
    request.on_complete do |response|
      results[url] = {
        success: response.success?,
        status: response.code,
        body: response.body,
        time: response.total_time
      }
    end
    
    request.on_failure do |response|
      results[url] = {
        success: false,
        error: response.return_message
      }
    end
    
    hydra.queue(request)
  end
  
  hydra.run
  results
end

# Beispiel: 100 URLs parallel abrufen
urls = (1..100).map { |i| "https://httpbin.org/delay/#{rand(1..3)}?id=#{i}" }

start_time = Time.now
results = fetch_parallel(urls, concurrency: 25)
duration = Time.now - start_time

successful = results.count { |_, r| r[:success] }
puts "Erfolgreich: #{successful}/#{urls.size} in #{duration.round(2)}s"
puts "Durchsatz: #{(urls.size / duration).round(2)} req/s"

Typhoeus mit rotierenden Sessions

Für echte IP-Rotation muss jede Anfrage einen anderen Session-Identifier verwenden. ProxyHat generiert dann für jede Session eine neue Exit-IP:

require 'typhoeus'
require 'securerandom'

class RotatingProxyScraper
  BASE_USER = 'your_username'
  BASE_PASS = 'your_password'
  
  def initialize(country: nil, city: nil)
    @country = country
    @city = city
  end
  
  def fetch_urls(urls, concurrency: 50)
    results = Concurrent::Hash.new
    hydra = Typhoeus::Hydra.new(max_concurrency: concurrency)
    
    urls.each do |url|
      session_id = SecureRandom.hex(8)
      proxy_user = build_username(session_id)
      
      request = Typhoeus::Request.new(url, {
        proxy: 'http://gate.proxyhat.com:8080',
        proxyuserpwd: "#{proxy_user}:#{BASE_PASS}",
        proxyauth: :basic,
        timeout: 30,
        headers: {
          'User-Agent' => random_user_agent,
          'Accept' => 'text/html,application/xhtml+xml'
        }
      })
      
      request.on_complete do |response|
        results[url] = {
          success: response.success?,
          status: response.code,
          ip: response.headers['X-Proxy-IP'],
          session: session_id
        }
      end
      
      hydra.queue(request)
    end
    
    hydra.run
    results
  end
  
  private
  
  def build_username(session_id)
    parts = [BASE_USER]
    parts << "country-#{@country}" if @country
    parts << "city-#{@city}" if @city
    parts << "session-#{session_id}"
    parts.join('-')
  end
  
  def random_user_agent
    agents = [
      'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
      'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15',
      'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 Chrome/120.0.0.0'
    ]
    agents.sample
  end
end

# 1000 URLs mit rotierenden IPs scrapen
scraper = RotatingProxyScraper.new(country: 'US')
urls = (1..1000).map { |i| "https://httpbin.org/ip?page=#{i}" }

start = Time.now
results = scraper.fetch_urls(urls, concurrency: 100)
puts "Abgeschlossen in #{(Time.now - start).round(2)}s"
puts "Erfolgsrate: #{results.values.count { |r| r[:success] } * 100 / results.size}%"

ProxyHat Ruby SDK: Rotation und Geo-Targeting

Das ProxyHat SDK vereinfacht die Konfiguration und bietet integrierte Rotation, Retry-Logik und Metriken.

require 'net/http'
require 'json'

module ProxyHat
  class Client
    DEFAULT_OPTIONS = {
      host: 'gate.proxyhat.com',
      port: 8080,
      timeout: 30,
      max_retries: 3,
      retry_delay: 1.0
    }.freeze
    
    attr_reader :username, :password, :options
    
    def initialize(username:, password:, **options)
      @username = username
      @password = password
      @options = DEFAULT_OPTIONS.merge(options)
    end
    
    # Einzelner Request mit automatischer IP-Rotation
    def get(url, country: nil, city: nil, session: nil)
      proxy_user = build_username(country: country, city: city, session: session)
      perform_with_retry(url, proxy_user)
    end
    
    # Parallele Requests mit Hydra-Integration
    def get_parallel(urls, country: nil, city: nil, concurrency: 50)
      require 'typhoeus'
      
      results = {}
      hydra = Typhoeus::Hydra.new(max_concurrency: concurrency)
      
      urls.each do |url|
        session = SecureRandom.hex(8)
        proxy_user = build_username(country: country, city: city, session: session)
        
        request = Typhoeus::Request.new(url, {
          proxy: "http://#{@options[:host]}:#{@options[:port]}",
          proxyuserpwd: "#{proxy_user}:#{@password}",
          timeout: @options[:timeout],
          followlocation: true,
          ssl_verifypeer: true
        })
        
        request.on_complete { |resp| results[url] = parse_response(resp) }
        hydra.queue(request)
      end
      
      hydra.run
      results
    end
    
    # Sticky Session für Multi-Step-Workflows
    def sticky_session(country: nil, city: nil)
      session_id = SecureRandom.uuid
      client = self.class.new(
        username: build_username(country: country, city: city, session: session_id),
        password: @password,
        **@options
      )
      yield client, session_id
    end
    
    private
    
    def build_username(country: nil, city: nil, session: nil)
      parts = [username]
      parts << "country-#{country}" if country
      parts << "city-#{city}" if city
      parts << "session-#{session}" if session
      parts.join('-')
    end
    
    def perform_with_retry(url, proxy_user, attempt: 0)
      uri = URI(url)
      proxy = Net::HTTP::Proxy(@options[:host], @options[:port], proxy_user, @password)
      
      http = proxy.new(uri.host, uri.port)
      http.use_ssl = (uri.scheme == 'https')
      http.open_timeout = @options[:timeout]
      http.read_timeout = @options[:timeout]
      
      response = http.request(Net::HTTP::Get.new(uri.request_uri))
      
      if response.is_a?(Net::HTTPSuccess)
        { success: true, status: response.code.to_i, body: response.body }
      else
        handle_failure(url, proxy_user, attempt, response)
      end
    rescue Net::OpenTimeout, Net::ReadTimeout, Errno::ECONNREFUSED => e
      handle_network_error(url, proxy_user, attempt, e)
    ensure
      http&.finish if http&.started?
    end
    
    def handle_failure(url, proxy_user, attempt, response)
      if [429, 502, 503, 504].include?(response.code.to_i) && attempt < @options[:max_retries]
        sleep(@options[:retry_delay] * (2 ** attempt))
        perform_with_retry(url, proxy_user, attempt: attempt + 1)
      else
        { success: false, status: response.code.to_i, error: 'http_error' }
      end
    end
    
    def handle_network_error(url, proxy_user, attempt, error)
      if attempt < @options[:max_retries]
        sleep(@options[:retry_delay] * (2 ** attempt))
        perform_with_retry(url, proxy_user, attempt: attempt + 1)
      else
        { success: false, error: 'network_error', message: error.message }
      end
    end
    
    def parse_response(response)
      {
        success: response.success?,
        status: response.code,
        body: response.body,
        time: response.total_time
      }
    end
  end
end

# Verwendung
client = ProxyHat::Client.new(
  username: 'your_username',
  password: 'your_password',
  timeout: 25,
  max_retries: 3
)

# Einzelner Request mit US-IP
result = client.get('https://httpbin.org/ip', country: 'US')
puts result[:body]

# Parallele Requests
urls = (1..500).map { |i| "https://api.example.com/items/#{i}" }
results = client.get_parallel(urls, country: 'DE', concurrency: 100)

# Sticky Session für Login-Workflow
client.sticky_session(country: 'US') do |sticky, session_id|
  sticky.get('https://example.com/login')
  sticky.post('https://example.com/auth', body: { user: 'test', pass: 'secret' })
  sticky.get('https://example.com/dashboard')
end

Produktionsreifes Scraping: 1000 URLs parallel

Hier ein vollständiges Beispiel für ein produktives Scraping-Szenario mit Fehlertoleranz, Metriken und Circuit-Breaker-Logik:

require 'typhoeus'
require 'json'
require 'logger'
require 'concurrent'

class ProductionScraper
  CIRCUIT_BREAKER_THRESHOLD = 5
  CIRCUIT_BREAKER_TIMEOUT = 60
  
  def initialize(username:, password:, country: 'US', concurrency: 100)
    @username = username
    @password = password
    @country = country
    @concurrency = concurrency
    @logger = Logger.new(STDOUT)
    @circuit_breaker = { failures: 0, open: false, opened_at: nil }
    @metrics = { total: 0, success: 0, failed: 0, retries: 0 }
    @mutex = Mutex.new
  end
  
  def scrape(urls)
    check_circuit_breaker!
    
    results = Concurrent::Hash.new
    hydra = Typhoeus::Hydra.new(max_concurrency: @concurrency)
    
    urls.each do |url|
      break if circuit_breaker_open?
      
      session = SecureRandom.hex(12)
      proxy_user = "#{@username}-country-#{@country}-session-#{session}"
      
      request = Typhoeus::Request.new(url, {
        proxy: 'http://gate.proxyhat.com:8080',
        proxyuserpwd: "#{proxy_user}:#{@password}",
        timeout: 30,
        followlocation: true,
        ssl_verifypeer: false, # Für Self-Signed-Certs
        headers: {
          'User-Agent' => random_user_agent,
          'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9',
          'Accept-Language' => 'en-US,en;q=0.9',
          'Accept-Encoding' => 'gzip, deflate'
        }
      })
      
      request.on_complete do |response|
        @mutex.synchronize do
          @metrics[:total] += 1
          
          if response.success?
            @metrics[:success] += 1
            results[url] = {
              success: true,
              status: response.code,
              body: response.body,
              size: response.body.bytesize
            }
            reset_circuit_breaker_on_success
          elsif response.timed_out?
            handle_failure(url, results, 'timeout')
          else
            handle_failure(url, results, response.return_message, response.code)
          end
        end
      end
      
      hydra.queue(request)
    end
    
    hydra.run
    
    {
      results: results,
      metrics: @metrics,
      success_rate: (@metrics[:success].to_f / @metrics[:total] * 100).round(2)
    }
  end
  
  private
  
  def handle_failure(url, results, error, status = nil)
    @metrics[:failed] += 1
    record_circuit_breaker_failure
    
    results[url] = {
      success: false,
      error: error,
      status: status
    }
    
    @logger.warn("Failed: #{url} - #{error}")
  end
  
  def check_circuit_breaker!
    if @circuit_breaker[:open]
      elapsed = Time.now - @circuit_breaker[:opened_at]
      if elapsed > CIRCUIT_BREAKER_TIMEOUT
        @circuit_breaker[:open] = false
        @circuit_breaker[:failures] = 0
        @logger.info("Circuit breaker reset after #{elapsed.round(1)}s")
      else
        raise "Circuit breaker open - waiting #{(CIRCUIT_BREAKER_TIMEOUT - elapsed).round(1)}s"
      end
    end
  end
  
  def circuit_breaker_open?
    @circuit_breaker[:open]
  end
  
  def record_circuit_breaker_failure
    @circuit_breaker[:failures] += 1
    
    if @circuit_breaker[:failures] >= CIRCUIT_BREAKER_THRESHOLD
      @circuit_breaker[:open] = true
      @circuit_breaker[:opened_at] = Time.now
      @logger.error("Circuit breaker opened after #{@circuit_breaker[:failures]} failures")
    end
  end
  
  def reset_circuit_breaker_on_success
    @circuit_breaker[:failures] = 0
  end
  
  def random_user_agent
    [
      'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
      'Mozilla/5.5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Safari/605.1.15',
      'Mozilla/5.0 (X11; Linux x86_64; rv:121.0) Gecko/20100101 Firefox/121.0'
    ].sample
  end
end

# Ausführung
scraper = ProductionScraper.new(
  username: 'your_username',
  password: 'your_password',
  country: 'US',
  concurrency: 100
)

urls = (1..1000).map { |i| "https://httpbin.org/delay/#{rand(1..2)}?id=#{i}" }

begin
  start = Time.now
  report = scraper.scrape(urls)
  duration = Time.now - start
  
  puts "\n=== Scraping Report ==="
  puts "Dauer: #{duration.round(2)}s"
  puts "Durchsatz: #{(report[:metrics][:total] / duration).round(2)} req/s"
  puts "Erfolgsrate: #{report[:success_rate]}%"
  puts "Erfolgreich: #{report[:metrics][:success]}/#{report[:metrics][:total]}"
rescue => e
  puts "Scraping abgebrochen: #{e.message}"
end

TLS/SSL-Konfiguration: Self-Signed Certs und SNI

Bei HTTPS-Proxies gibt es einige Fallstricke. Besonders bei Self-Signed-Zertifikaten oder wenn Server Name Indication (SNI) korrekt gesetzt werden muss.

require 'net/http'
require 'openssl'

class TLSProxyClient
  def initialize(proxy_host:, proxy_port:, proxy_user:, proxy_pass:)
    @proxy_host = proxy_host
    @proxy_port = proxy_port
    @proxy_user = proxy_user
    @proxy_pass = proxy_pass
  end
  
  # Option 1: Strikte TLS-Verifizierung (Produktion)
  def fetch_strict(url)
    uri = URI(url)
    proxy = Net::HTTP::Proxy(@proxy_host, @proxy_port, @proxy_user, @proxy_pass)
    
    http = proxy.new(uri.host, uri.port)
    http.use_ssl = true
    http.verify_mode = OpenSSL::SSL::VERIFY_PEER
    http.cert_store = trusted_cert_store
    
    # SNI explizit setzen
    http.enable_post_connection_check = true
    
    request = Net::HTTP::Get.new(uri.request_uri)
    http.request(request).body
  ensure
    http&.finish
  end
  
  # Option 2: Self-Signed-Certs akzeptieren (nur Dev/Staging!)
  def fetch_insecure(url)
    uri = URI(url)
    proxy = Net::HTTP::Proxy(@proxy_host, @proxy_port, @proxy_user, @proxy_pass)
    
    http = proxy.new(uri.host, uri.port)
    http.use_ssl = true
    http.verify_mode = OpenSSL::SSL::VERIFY_NONE
    http.ssl_version = :TLSv1_2
    
    request = Net::HTTP::Get.new(uri.request_uri)
    http.request(request).body
  ensure
    http&.finish
  end
  
  # Option 3: Mit Custom CA-Bundle
  def fetch_with_ca_bundle(url, ca_bundle_path)
    uri = URI(url)
    proxy = Net::HTTP::Proxy(@proxy_host, @proxy_port, @proxy_user, @proxy_pass)
    
    http = proxy.new(uri.host, uri.port)
    http.use_ssl = true
    http.verify_mode = OpenSSL::SSL::VERIFY_PEER
    http.ca_file = ca_bundle_path
    http.verify_depth = 5
    
    request = Net::HTTP::Get.new(uri.request_uri)
    http.request(request).body
  ensure
    http&.finish
  end
  
  # Typhoeus mit TLS-Optionen
  def fetch_typhoeus_tls(url, verify: true)
    require 'typhoeus'
    
    Typhoeus.get(url, {
      proxy: "http://#{@proxy_host}:#{@proxy_port}",
      proxyuserpwd: "#{@proxy_user}:#{@proxy_pass}",
      ssl_verifypeer: verify,
      ssl_verifyhost: verify ? 2 : 0,
      sslversion: :tlsv1_2,
      capath: '/etc/ssl/certs',
      timeout: 30
    })
  end
  
  private
  
  def trusted_cert_store
    store = OpenSSL::X509::Store.new
    store.set_default_paths
    # Zusätzliche CA-Zertifikate
    store.add_file('/etc/ssl/certs/ca-certificates.crt') if File.exist?('/etc/ssl/certs/ca-certificates.crt')
    store.add_file('/etc/ssl/cert.pem') if File.exist?('/etc/ssl/cert.pem')
    store
  end
end

# Verwendung
client = TLSProxyClient.new(
  proxy_host: 'gate.proxyhat.com',
  proxy_port: 8080,
  proxy_user: 'user-country-US',
  proxy_pass: 'your_password'
)

# Strikte Verifizierung
body = client.fetch_strict('https://example.com')

# Für Self-Signed-Entwicklungsserver
body = client.fetch_insecure('https://internal-dev.local/api')

Rails-Integration: Faraday Middleware und ActiveJob

In Rails-Anwendungen ist Faraday der De-facto-Standard für HTTP-Clients. Hier eine vollständige Integration mit Proxy-Middleware und Background-Jobs.

# config/initializers/proxy.rb
require 'faraday'

module ProxyHat
  class FaradayMiddleware < Faraday::Middleware
    def initialize(app, username:, password:, country: nil)
      super(app)
      @username = username
      @password = password
      @country = country
    end
    
    def call(env)
      session = SecureRandom.hex(8)
      proxy_user = build_username(session)
      
      env[:proxy] = {
        uri: URI('http://gate.proxyhat.com:8080'),
        user: proxy_user,
        password: @password
      }
      
      @app.call(env)
    end
    
    private
    
    def build_username(session)
      parts = [@username]
      parts << "country-#{@country}" if @country
      parts << "session-#{session}"
      parts.join('-')
    end
  end
end

# Faraday-Connection mit Proxy-Middleware
class ApiClient
  def initialize(country: nil)
    @country = country
  end
  
  def connection
    @connection ||= Faraday.new do |builder|
      builder.use ProxyHat::FaradayMiddleware,
        username: Rails.application.config.proxy_username,
        password: Rails.application.config.proxy_password,
        country: @country
      
      builder.request :retry, {
        max: 3,
        interval: 1.0,
        backoff_factor: 2,
        retry_statuses: [429, 502, 503, 504]
      }
      
      builder.response :json, content_type: /json$/
      builder.response :raise_error
      
      builder.adapter :typhoeus do |adapter|
        adapter.options = {
          timeout: 30,
          followlocation: true,
          ssl_verifypeer: true
        }
      end
    end
  end
  
  def get(path)
    connection.get(path)
  end
  
  def post(path, body)
    connection.post(path, body.to_json, 'Content-Type' => 'application/json')
  end
end

# ActiveJob für Background-Scraping
class ScrapeJob < ApplicationJob
  queue_as :scraping
  
  retry_on Net::OpenTimeout, wait: :exponentially_longer, attempts: 3
  retry_on Net::ReadTimeout, wait: :exponentially_longer, attempts: 3
  
  def perform(urls, country: 'US')
    client = ApiClient.new(country: country)
    
    results = urls.map do |url|
      begin
        response = client.get(url)
        { url: url, success: true, data: response.body }
      rescue Faraday::Error => e
        { url: url, success: false, error: e.message }
      end
    end
    
    # Ergebnisse speichern
    ScrapeResult.import(results.select { |r| r[:success] })
    
    # Benachrichtigung bei Fehlern
    if results.any? { |r| !r[:success] }
      ScrapeFailureMailer.with(failures: results.reject { |r| r[:success] }).alert.deliver_later
    end
  end
end

# Batch-Verarbeitung mit Jobs
class BatchScrapeJob < ApplicationJob
  queue_as :scraping
  
  def perform(url_batch, country: 'US')
    # URLs auf kleinere Batches aufteilen
    url_batch.each_slice(50) do |slice|
      ScrapeJob.perform_later(slice, country: country)
    end
  end
end

# Controller-Beispiel
class ScraperController < ApplicationController
  def start_scrape
    urls = params[:urls]
    country = params[:country] || 'US'
    
    # In Batches aufteilen
    urls.each_slice(100) do |batch|
      BatchScrapeJob.perform_later(batch, country: country)
    end
    
    render json: { status: 'queued', batches: (urls.size / 100.0).ceil }
  end
end

# config/application.rb
module MyApp
  class Application < Rails::Application
    config.proxy_username = ENV['PROXYHAT_USERNAME']
    config.proxy_password = ENV['PROXYHAT_PASSWORD']
  end
end

Vergleich der Proxy-Ansätze in Ruby

Ansatz Vorteile Nachteile Use Case
Net::HTTP Stdlib, keine Dependencies, einfach Keine parallelen Requests, limitierte Features Einfache API-Calls, Prototyping
Typhoeus Parallele Requests via Hydra, libcurl-Features Zusätzliche Gem, libcurl-Abhängigkeit High-Volume Scraping, parallele Datenabfrage
ProxyHat SDK Integrierte Rotation, Geo-Targeting, Retry-Logik Herstellerspezifisch Produktive Scraping-Pipelines mit IP-Rotation
Faraday Middleware-System, Rails-Integration Overhead, Konfiguration nötig Rails-Anwendungen, API-Wrapper

Key Takeaways

1. Net::HTTP reicht für einfache Anforderungen – Die Standardbibliothek deckt grundlegende Proxy-Anforderungen ab, inklusive Authentifizierung. Für produktive Anwendungen ist jedoch eine Retry-Logik essenziell.

2. Typhoeus für parallele Anfragen – Bei hunderten gleichzeitigen Requests ist Typhoeus mit Hydra unverzichtbar. Die libcurl-Basis bietet zudem bessere TLS-Optionen.

3. IP-Rotation über Username-Flags – ProxyHat ermöglicht Rotation und Geo-Targeting durch spezielle Username-Formate wie user-country-US-session-abc123.

4. Circuit Breaker schützen vor Kaskadenausfällen – Bei produktiven Scrapern ist ein Circuit Breaker Pflicht, um bei gehäuften Fehlern die Proxies zu schonen.

5. Rails-Integration via Faraday – Faraday-Middleware ermöglicht saubere Proxy-Integration in Rails, ActiveJob eignet sich für Background-Scraping.

Weiterführende Ressourcen

Für produktive Scraping-Pipelines mit Ruby sind rotierende Residential Proxies von ProxyHat die zuverlässigste Wahl. Die Kombination aus Typhoeus für parallele Anfragen und dem ProxyHat SDK für Rotation bietet die beste Balance aus Performance und Stabilität.

Bereit loszulegen?

Zugang zu über 50 Mio. Residential-IPs in über 148 Ländern mit KI-gesteuerter Filterung.

Preise ansehenResidential Proxies
← Zurück zum Blog