Ruby geliştiricileri olarak, web scraping veya veri boru hatları oluştururken sıklıkla IP engellemeleri, rate limiting ve coğrafi kısıtlamalarla karşılaşırız. Doğru proxy yapılandırması olmadan, en iyi yazılmış scraping kodu bile saatler içinde kullanılamaz hale gelebilir. Bu rehber, Ruby ekosisteminde HTTP proxy kullanımını Net::HTTP standart kütüphanesinden Typhoeus gibi yüksek performanslı kütüphanelere kadar pratik örneklerle ele alıyor.
Neden Ruby Proxy Kullanımı Kritik?
Modern web siteleri bot tespiti için sofistike yöntemler kullanır: IP tabanlı rate limiting, davranış analizi, CAPTCHA tetikleyicileri ve coğrafi engellemeler. Ruby proxy çözümleri bu sorunları aşmada ilk savunma hattınızdır. Doğru proxy stratejisi ile:
- IP rotasyonu ile rate limit'leri aşabilirsiniz
- Coğrafi hedefleme ile bölgesel kısıtlamaları bypass edebilirsiniz
- Paralel istekler ile throughput'u artırabilirsiniz
- Residential proxy'ler ile bot tespiti riskini azaltabilirsiniz
Net::HTTP ile Temel Proxy Kullanımı
Ruby standart kütüphanesi Net::HTTP, ek bağımlılık gerektirmeden proxy desteği sunar. Ancak production ortamı için hata yönetimi, timeout yapılandırması ve retry logic eklemeniz gerekir.
Temel Proxy Bağlantısı
require 'net/http'
require 'uri'
require 'logger'
class ProxyHTTPClient
def initialize(proxy_host:, proxy_port:, username: nil, password: nil, timeout: 30)
@proxy_host = proxy_host
@proxy_port = proxy_port
@username = username
@password = password
@timeout = timeout
@logger = Logger.new($stdout)
end
def get(url, headers = {})
uri = URI.parse(url)
# Proxy bağlantısını oluştur
http = Net::HTTP.new(
uri.host,
uri.port,
@proxy_host,
@proxy_port,
@username,
@password
)
# TLS/SSL yapılandırması
if uri.scheme == 'https'
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_PEER
http.ca_file = OpenSSL::X509::DEFAULT_CERT_FILE
end
# Timeout ayarları
http.open_timeout = @timeout
http.read_timeout = @timeout
http.write_timeout = @timeout
# İsteği oluştur ve gönder
request = Net::HTTP::Get.new(uri.request_uri)
headers.each { |key, value| request[key] = value }
request['User-Agent'] ||= 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
response = http.request(request)
@logger.info("GET #{url} -> #{response.code}")
{
status: response.code.to_i,
headers: response.each_header.to_h,
body: response.body
}
rescue Net::OpenTimeout => e
@logger.error("Connection timeout: #{e.message}")
raise ConnectionError, "Proxy connection timeout: #{e.message}"
rescue Net::ReadTimeout => e
@logger.error("Read timeout: #{e.message}")
raise ConnectionError, "Proxy read timeout: #{e.message}"
rescue Net::HTTPError, Net::HTTPFatalError => e
@logger.error("HTTP error: #{e.message}")
raise HTTPError, "HTTP request failed: #{e.message}"
rescue SocketError => e
@logger.error("DNS/Network error: #{e.message}")
raise NetworkError, "Network error: #{e.message}"
ensure
http&.finish if http&.started?
end
end
# Kullanım örneği - ProxyHat residential proxy ile
client = ProxyHTTPClient.new(
proxy_host: 'gate.proxyhat.com',
proxy_port: 8080,
username: 'user-country-US',
password: 'your_password'
)
result = client.get('https://httpbin.org/ip')
puts "Status: #{result[:status]}"
puts "Body: #{result[:body]}"
Retry Logic ve Circuit Breaker Pattern
Production ortamında geçici network hataları için exponential backoff ile retry mekanizması şarttır:
require 'net/http'
class ResilientProxyClient
MAX_RETRIES = 3
BASE_DELAY = 1 # saniye
def initialize(proxy_config)
@proxy_config = proxy_config
@failure_count = 0
@circuit_open = false
@circuit_opened_at = nil
end
def get_with_retry(url, headers = {})
raise CircuitOpenError, "Circuit breaker is open" if circuit_open?
retries = 0
last_error = nil
loop do
begin
response = execute_request(url, headers)
# Başarılı istek - failure counter'ı sıfırla
@failure_count = 0
return response
rescue NetworkError, ConnectionError, Timeout::Error => e
last_error = e
retries += 1
@failure_count += 1
# Circuit breaker kontrolü
if @failure_count >= 5
open_circuit!
raise CircuitOpenError, "Too many failures, circuit opened"
end
if retries >= MAX_RETRIES
@logger&.error("Max retries (#{MAX_RETRIES}) exceeded for #{url}")
raise MaxRetriesExceededError, "Failed after #{MAX_RETRIES} retries: #{e.message}"
end
# Exponential backoff
delay = BASE_DELAY * (2 ** (retries - 1)) + rand(0.0..0.5)
@logger&.warn("Retry #{retries}/#{MAX_RETRIES} after #{delay.round(2)}s: #{e.message}")
sleep(delay)
end
end
end
private
def execute_request(url, headers)
uri = URI(url)
http = Net::HTTP.new(
uri.host, uri.port,
@proxy_config[:host], @proxy_config[:port],
@proxy_config[:username], @proxy_config[:password]
)
http.use_ssl = (uri.scheme == 'https')
http.open_timeout = 15
http.read_timeout = 30
request = Net::HTTP::Get.new(uri.request_uri, headers)
response = http.request(request)
case response
when Net::HTTPSuccess
{ status: response.code.to_i, body: response.body, headers: response.to_hash }
when Net::HTTPRedirection
{ status: response.code.to_i, location: response['Location'], body: response.body }
when Net::HTTPTooManyRequests
raise RateLimitError, "Rate limited (429)"
when Net::HTTPServerError
raise ServerError, "Server error: #{response.code}"
else
raise HTTPError, "HTTP #{response.code}: #{response.message}"
end
ensure
http&.finish if http&.started?
end
def circuit_open?
return false unless @circuit_open
# 30 saniye sonra circuit'i half-open'a çevir
if Time.now - @circuit_opened_at > 30
@circuit_open = false
false
else
true
end
end
def open_circuit!
@circuit_open = true
@circuit_opened_at = Time.now
end
end
# Özel hata sınıfları
class CircuitOpenError < StandardError; end
class MaxRetriesExceededError < StandardError; end
class NetworkError < StandardError; end
class ConnectionError < StandardError; end
class RateLimitError < StandardError; end
# Kullanım
client = ResilientProxyClient.new(
host: 'gate.proxyhat.com',
port: 8080,
username: 'user-country-US-session-abc123',
password: 'your_password'
)
begin
result = client.get_with_retry('https://example.com/api/data')
puts result[:body]
rescue CircuitOpenError => e
puts "Service unavailable: #{e.message}"
rescue MaxRetriesExceededError => e
puts "Failed permanently: #{e.message}"
end
Typhoeus ile Paralel Proxy İstekleri
Typhoeus, libcurl tabanlı, yüksek performanslı bir HTTP istemcisidir. Hydra ile paralel istek yönetimi sunar - scraping için idealdir. Net::HTTP proxy çözümlerine göre çok daha verimlidir.
require 'typhoeus'
class ParallelProxyScraper
def initialize(proxy_host:, proxy_port:, username:, password:, max_concurrent: 50)
@proxy_host = proxy_host
@proxy_port = proxy_port
@proxy_auth = "#{username}:#{password}"
@max_concurrent = max_concurrent
@logger = Logger.new($stdout, level: Logger::INFO)
end
def fetch_urls(urls, headers: {})
hydra = Typhoeus::Hydra.new(max_concurrent: @max_concurrent)
results = Concurrent::Array.new
mutex = Mutex.new
urls.each_with_index do |url, index|
request = Typhoeus::Request.new(
url,
method: :get,
headers: default_headers.merge(headers),
proxy: "http://#{@proxy_auth}@#{@proxy_host}:#{@proxy_port}",
proxyauth: :basic,
timeout: 30,
connecttimeout: 10,
followlocation: true,
ssl_verifypeer: true,
ssl_verifyhost: 2
)
request.on_complete do |response|
mutex.synchronize do
results << {
url: url,
index: index,
status: response.response_code,
body: response.body,
total_time: response.total_time,
success: response.success?,
error: response.status_message unless response.success?
}
end
@logger.info("[#{index + 1}/#{urls.length}] #{url} -> #{response.response_code} (#{response.total_time.round(2)}s)")
end
hydra.queue(request)
end
# Tüm istekleri çalıştır
hydra.run
# Sonuçları sırala ve döndür
results.sort_by { |r| r[:index] }
end
def fetch_with_retry(url, max_retries: 3, headers: {})
retries = 0
loop do
response = Typhoeus::Request.get(
url,
headers: default_headers.merge(headers),
proxy: "http://#{@proxy_auth}@#{@proxy_host}:#{@proxy_port}",
proxyauth: :basic,
timeout: 30,
connecttimeout: 10,
followlocation: true
)
return response if response.success?
retries += 1
break if retries >= max_retries
# 429 (rate limit) için daha uzun bekle
delay = response.response_code == 429 ? 5 : 1
sleep(delay + rand(0.0..0.5))
end
end
private
def default_headers
{
'User-Agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language' => 'en-US,en;q=0.5',
'Accept-Encoding' => 'gzip, deflate',
'Connection' => 'keep-alive'
}
end
end
# Kullanım - ProxyHat residential proxy ile paralel scraping
scraper = ParallelProxyScraper.new(
proxy_host: 'gate.proxyhat.com',
proxy_port: 8080,
username: 'user-country-US', # ABD IP'leri
password: 'your_password',
max_concurrent: 25 # Aynı anda 25 istek
)
urls = (1..100).map { |i| "https://httpbin.org/delay/#{rand(1..3)}?id=#{i}" }
puts "#{urls.length} URL paralel olarak çekiliyor..."
start_time = Time.now
results = scraper.fetch_urls(urls)
successful = results.count { |r| r[:success] }
failed = results.count { |r| !r[:success] }
total_time = Time.now - start_time
puts "\n=== Sonuçlar ==="
puts "Başarılı: #{successful}"
puts "Başarısız: #{failed}"
puts "Toplam süre: #{total_time.round(2)}s"
puts "Ortalama istek/saniye: #{(urls.length / total_time).round(2)}"
ProxyHat Ruby SDK ile IP Rotasyonu ve Coğrafi Hedefleme
Ruby residential proxies kullanırken, ProxyHat SDK IP rotasyonu ve coğrafi hedeflemeyi basitleştirir. Her istekte farklı IP kullanabilir veya sticky session ile aynı IP'yi koruyabilirsiniz.
require 'net/http'
require 'json'
require 'securerandom'
module ProxyHat
class Client
GATEWAY_HOST = 'gate.proxyhat.com'
HTTP_PORT = 8080
SOCKS5_PORT = 1080
attr_reader :username, :password, :options
def initialize(username:, password:, **options)
@username = username
@password = password
@options = {
country: nil,
city: nil,
session: nil,
session_duration: nil,
proxy_type: :http
}.merge(options)
end
def proxy_url
auth_username = build_username
case options[:proxy_type]
when :socks5
"socks5://#{auth_username}:#{password}@#{GATEWAY_HOST}:#{SOCKS5_PORT}"
else
"http://#{auth_username}:#{password}@#{GATEWAY_HOST}:#{HTTP_PORT}"
end
end
def proxy_config
{
host: GATEWAY_HOST,
port: options[:proxy_type] == :socks5 ? SOCKS5_PORT : HTTP_PORT,
username: build_username,
password: password
}
end
def with_session(session_id = SecureRandom.hex(8), duration: 10)
self.class.new(
username: username,
password: password,
**options.merge(session: session_id, session_duration: duration)
)
end
def with_country(country_code)
self.class.new(
username: username,
password: password,
**options.merge(country: country_code.upcase)
)
end
def with_city(country_code, city)
self.class.new(
username: username,
password: password,
**options.merge(country: country_code.upcase, city: city.downcase)
)
end
def get(url, headers: {})
config = proxy_config
uri = URI(url)
http = Net::HTTP.new(
uri.host, uri.port,
config[:host], config[:port],
config[:username], config[:password]
)
http.use_ssl = (uri.scheme == 'https')
http.open_timeout = 30
http.read_timeout = 60
request = Net::HTTP::Get.new(uri.request_uri)
headers.each { |k, v| request[k] = v }
request['User-Agent'] ||= default_user_agent
response = http.request(request)
{
status: response.code.to_i,
body: response.body,
headers: response.each_header.to_h,
proxy_ip: response['X-ProxyHat-IP'] # Debug için
}
ensure
http&.finish if http&.started?
end
private
def build_username
parts = [username]
parts << "country-#{options[:country]}" if options[:country]
parts << "city-#{options[:city]}" if options[:city]
parts << "session-#{options[:session]}" if options[:session]
parts << "sessionduration-#{options[:session_duration]}" if options[:session_duration]
parts.join('-')
end
def default_user_agent
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
end
end
# Rotating proxy pool manager
class RotatingProxyPool
def initialize(username:, password:, pool_size: 10)
@username = username
@password = password
@pool_size = pool_size
@sessions = Concurrent::Map.new
end
def acquire(country: nil)
session_id = SecureRandom.hex(8)
client = Client.new(
username: @username,
password: @password,
country: country,
session: session_id,
session_duration: 15
)
@sessions[session_id] = {
client: client,
created_at: Time.now,
request_count: 0
}
yield client if block_given?
ensure
release(session_id) if block_given?
end
def release(session_id)
@sessions.delete(session_id)
end
def stats
{
active_sessions: @sessions.size,
sessions: @sessions.keys
}
end
end
end
# Kullanım örnekleri
# 1. Basit rotating proxy (her istekte farklı IP)
client = ProxyHat::Client.new(
username: 'user',
password: 'your_password'
)
10.times do |i|
result = client.get('https://httpbin.org/ip')
ip = JSON.parse(result[:body])['origin']
puts "İstek #{i + 1}: IP = #{ip}"
end
# 2. Sticky session (aynı IP'yi koru)
sticky_client = client.with_session('my-session-123', duration: 30)
5.times do |i|
result = sticky_client.get('https://httpbin.org/ip')
ip = JSON.parse(result[:body])['origin']
puts "Sticky istek #{i + 1}: IP = #{ip} (aynı kalmalı)"
end
# 3. Coğrafi hedefleme
us_client = client.with_country('US')
de_client = client.with_country('DE')
tokyo_client = client.with_city('JP', 'tokyo')
puts "US IP: #{JSON.parse(us_client.get('https://httpbin.org/ip')[:body])['origin']}"
puts "DE IP: #{JSON.parse(de_client.get('https://httpbin.org/ip')[:body])['origin']}"
puts "Tokyo IP: #{JSON.parse(tokyo_client.get('https://httpbin.org/ip')[:body])['origin']}"
# 4. Proxy pool ile paralel scraping
pool = ProxyHat::RotatingProxyPool.new(
username: 'user',
password: 'your_password',
pool_size: 20
)
threads = 10.times.map do |i|
Thread.new do
pool.acquire(country: 'US') do |proxy_client|
result = proxy_client.get('https://httpbin.org/ip')
puts "Thread #{i}: #{JSON.parse(result[:body])['origin']}"
end
end
end
threads.each(&:join)
puts "Pool stats: #{pool.stats}"
Production Örneği: 1000 URL'li Paralel Scraper
Gerçek dünya senaryosunda 1000 URL'yi Ruby residential proxies ile paralel çekelim. Bu örnekte IP rotasyonu, hata yönetimi ve ilerleme takibi bir arada:
require 'typhoeus'
require 'concurrent'
require 'json'
require 'csv'
require 'logger'
class ProductionScraper
BATCH_SIZE = 50
MAX_RETRIES = 3
def initialize(proxy_username:, proxy_password:, output_file: 'results.csv')
@proxy_username = proxy_username
@proxy_password = proxy_password
@output_file = output_file
@logger = Logger.new($stdout, level: Logger::INFO)
@stats = Concurrent::AtomicReference.new({ success: 0, failed: 0, retries: 0 })
@results = Concurrent::Array.new
@mutex = Mutex.new
end
def scrape(urls, country: 'US')
@logger.info("Starting scrape of #{urls.length} URLs with country: #{country}")
start_time = Time.now
# URL'leri batch'lere böl
batches = urls.each_slice(BATCH_SIZE).to_a
@logger.info("Split into #{batches.length} batches of max #{BATCH_SIZE} URLs")
# Her batch'i işle
batches.each_with_index do |batch, batch_idx|
@logger.info("Processing batch #{batch_idx + 1}/#{batches.length}")
process_batch(batch, country, batch_idx)
# Batch'ler arası kısa bekleme (rate limiting için)
sleep(0.5) unless batch_idx == batches.length - 1
end
# Sonuçları kaydet
save_results
total_time = Time.now - start_time
stats = @stats.get
@logger.info("\n" + "="*50)
@logger.info("Scraping completed!")
@logger.info("Total URLs: #{urls.length}")
@logger.info("Successful: #{stats[:success]}")
@logger.info("Failed: #{stats[:failed]}")
@logger.info("Total retries: #{stats[:retries]}")
@logger.info("Total time: #{total_time.round(2)}s")
@logger.info("Rate: #{(urls.length / total_time).round(2)} URLs/sec")
@logger.info("Results saved to: #{@output_file}")
{ success: stats[:success], failed: stats[:failed], time: total_time }
end
private
def process_batch(urls, country, batch_idx)
hydra = Typhoeus::Hydra.new(max_concurrent: 25)
urls.each_with_index do |url, url_idx|
# Her istek için benzersiz session ID (IP rotasyonu)
session_id = "batch#{batch_idx}-url#{url_idx}-#{SecureRandom.hex(4)}"
proxy_auth = "#{@proxy_username}-country-#{country}-session-#{session_id}:#{@proxy_password}"
request = Typhoeus::Request.new(
url,
method: :get,
proxy: "http://#{proxy_auth}@gate.proxyhat.com:8080",
proxyauth: :basic,
timeout: 45,
connecttimeout: 15,
followlocation: true,
ssl_verifypeer: true,
headers: random_headers
)
request.on_complete do |response|
handle_response(url, response)
end
hydra.queue(request)
end
hydra.run
end
def handle_response(url, response, retry_count = 0)
if response.success? && response.response_code == 200
@mutex.synchronize do
@results << {
url: url,
status: response.response_code,
body: response.body[0..500], # İlk 500 karakter
time: response.total_time,
success: true
}
stats = @stats.get
stats[:success] += 1
@stats.set(stats)
end
@logger.debug("✓ #{url} -> 200 (#{response.total_time.round(2)}s)")
elsif response.response_code == 429 && retry_count < MAX_RETRIES
# Rate limit - bekle ve tekrar dene
@mutex.synchronize do
stats = @stats.get
stats[:retries] += 1
@stats.set(stats)
end
sleep(3 + rand(1..5))
retry_request(url, retry_count + 1)
elsif response.response_code >= 500 && retry_count < MAX_RETRIES
# Server error - retry
sleep(1 + rand(0..2))
retry_request(url, retry_count + 1)
else
# Başarısız
@mutex.synchronize do
@results << {
url: url,
status: response.response_code,
error: response.status_message || 'Unknown error',
success: false
}
stats = @stats.get
stats[:failed] += 1
@stats.set(stats)
end
@logger.warn("✗ #{url} -> #{response.response_code}")
end
end
def retry_request(url, retry_count)
session_id = "retry-#{SecureRandom.hex(4)}"
proxy_auth = "#{@proxy_username}-session-#{session_id}:#{@proxy_password}"
response = Typhoeus::Request.get(
url,
proxy: "http://#{proxy_auth}@gate.proxyhat.com:8080",
proxyauth: :basic,
timeout: 45,
connecttimeout: 15,
headers: random_headers
)
handle_response(url, response, retry_count)
end
def random_headers
'User-Agent' => %w[
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36
].sample,
'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language' => 'en-US,en;q=0.9',
'Accept-Encoding' => 'gzip, deflate',
'DNT' => '1',
'Connection' => 'keep-alive'
}
end
def save_results
CSV.open(@output_file, 'w', write_headers: true, headers: ['url', 'status', 'success', 'body', 'error']) do |csv|
@results.each do |result|
csv << [
result[:url],
result[:status],
result[:success],
result[:body],
result[:error]
]
end
end
end
end
# Kullanım
scraper = ProductionScraper.new(
proxy_username: 'user',
proxy_password: 'your_password',
output_file: 'scraping_results.csv'
)
# 1000 URL oluştur (örnek)
urls = 1000.times.map do |i|
"https://httpbin.org/delay/#{rand(1..2)}?page=#{i}&category=#{%w[tech sports news].sample}"
end
# Scraping'i başlat
result = scraper.scrape(urls, country: 'US')
puts "\nFinal results: #{result[:success]} successful, #{result[:failed]} failed"
TLS/SSL Yapılandırması ve SNI Desteği
Proxy kullanırken TLS/SSL yapılandırması kritiktir. Self-signed sertifikalar, SNI (Server Name Indication) ve certificate pinning gibi konuları ele alalım:
require 'net/http'
require 'openssl'
class TLSProxyClient
def initialize(proxy_host:, proxy_port:, username:, password:)
@proxy_host = proxy_host
@proxy_port = proxy_port
@username = username
@password = password
@cert_store = build_cert_store
end
def get(url, verify_ssl: true, allow_self_signed: false)
uri = URI(url)
http = Net::HTTP.new(
uri.host, uri.port,
@proxy_host, @proxy_port,
@username, @password
)
if uri.scheme == 'https'
configure_ssl(http, verify_ssl: verify_ssl, allow_self_signed: allow_self_signed)
end
http.open_timeout = 30
http.read_timeout = 60
# SNI desteği (varsayılan olarak açık)
# Belirli host'lar için SNI override
http.sni_hostname = uri.host
request = Net::HTTP::Get.new(uri.request_uri)
request['Host'] = uri.host # Host header'ı açıkça set et
response = http.request(request)
{
status: response.code.to_i,
body: response.body,
ssl_info: extract_ssl_info(http)
}
rescue OpenSSL::SSL::SSLError => e
handle_ssl_error(e, url)
ensure
http&.finish if http&.started?
end
def get_with_cert_pinning(url, pinned_certs: [])
uri = URI(url)
http = Net::HTTP.new(
uri.host, uri.port,
@proxy_host, @proxy_port,
@username, @password
)
if uri.scheme == 'https'
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_PEER
# Custom certificate verification
http.verify_callback = lambda do |preverify_ok, cert_store|
return false unless preverify_ok
cert = cert_store.chain.first
cert_fingerprint = OpenSSL::Digest::SHA256.hexdigest(cert.to_der)
if pinned_certs.any?
pinned_certs.include?(cert_fingerprint)
else
true
end
end
end
http.request(Net::HTTP::Get.new(uri.request_uri))
end
private
def configure_ssl(http, verify_ssl:, allow_self_signed:)
http.use_ssl = true
if verify_ssl
if allow_self_signed
# Self-signed sertifikaları kabul et (geliştirme için)
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
warn "WARNING: SSL verification disabled - not for production!"
else
http.verify_mode = OpenSSL::SSL::VERIFY_PEER
http.cert_store = @cert_store
end
else
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
end
# TLS versiyonu zorla
http.min_version = OpenSSL::SSL::TLS1_2_VERSION
http.max_version = OpenSSL::SSL::TLS1_3_VERSION
# Cipher suite'leri sınırla
http.ciphers = [
'ECDHE-ECDSA-AES128-GCM-SHA256',
'ECDHE-RSA-AES128-GCM-SHA256',
'ECDHE-ECDSA-AES256-GCM-SHA384',
'ECDHE-RSA-AES256-GCM-SHA384'
].join(':')
end
def build_cert_store
store = OpenSSL::X509::Store.new
store.set_default_paths
# Ek CA sertifikaları
%w[
/etc/ssl/certs/ca-certificates.crt
/etc/pki/tls/certs/ca-bundle.crt
/usr/local/etc/openssl/cert.pem
].each do |path|
store.add_file(path) if File.exist?(path)
end
store
end
def extract_ssl_info(http)
return nil unless http.use_ssl?
{
version: http.ssl_version,
cipher: http.cipher,
peer_cert: http.peer_cert&.subject&.to_s
}
rescue => e
{ error: e.message }
end
def handle_ssl_error(error, url)
case error.message
when /certificate verify failed/
raise SSLCertificateError, "Certificate verification failed for #{url}: #{error.message}"
when /handshake failure/
raise SSLHandshakeError, "SSL handshake failed for #{url}: #{error.message}"
when /hostname does not match/
raise SSLHostnameError, "Hostname mismatch for #{url}: #{error.message}"
else
raise SSLCertificateError, "SSL error for #{url}: #{error.message}"
end
end
end
# Özel hata sınıfları
class SSLCertificateError < StandardError; end
class SSLHandshakeError < StandardError; end
class SSLHostnameError < StandardError; end
# Kullanım
client = TLSProxyClient.new(
proxy_host: 'gate.proxyhat.com',
proxy_port: 8080,
username: 'user-country-US',
password: 'your_password'
)
# Normal SSL doğrulama
result = client.get('https://example.com')
puts "SSL Info: #{result[:ssl_info]}"
# Self-signed sertifika kabul et (sadece geliştirme!)
result = client.get('https://internal-api.local', allow_self_signed: true)
# Certificate pinning ile
PINNED_CERTS = [
'a1b2c3d4e5f6...' # SHA256 fingerprint
]
result = client.get_with_cert_pinning('https://api.example.com', pinned_certs: PINNED_CERTS)
Ruby on Rails Entegrasyonu
Rails uygulamanızda proxy kullanımı için Faraday middleware ve ActiveJob entegrasyonu:
# config/initializers/proxy.rb
require 'faraday'
require 'faraday/middleware'
module ProxyHat
class FaradayMiddleware < Faraday::Middleware
def initialize(app, username:, password:, country: nil, session: nil)
super(app)
@username = username
@password = password
@country = country
@session = session
end
def call(env)
proxy_auth = build_proxy_auth
env[:proxy] = {
uri: URI("http://#{proxy_auth}@gate.proxyhat.com:8080"),
user: proxy_auth.split(':').first,
password: @password
}
@app.call(env)
end
private
def build_proxy_auth
parts = [@username]
parts << "country-#{@country}" if @country
parts << "session-#{@session}" if @session
parts.join('-')
end
end
end
# Faraday connection factory
module ApiClient
class Connection
def self.create(country: nil, session: nil)
Faraday.new do |builder|
builder.use ProxyHat::FaradayMiddleware,
username: Rails.application.config.proxy_hat_username,
password: Rails.application.config.proxy_hat_password,
country: country,
session: session
builder.request :json
builder.response :json, content_type: /json$/
builder.response :logger, Rails.logger, bodies: true
builder.adapter Faraday.default_adapter
end
end
end
end
# app/jobs/scraping_job.rb
class ScrapingJob < ActiveJob::Base
queue_as :scraping
# Retry with exponential backoff
retry_on NetworkError, wait: :exponentially_longer, attempts: 5
retry_on RateLimitError, wait: 10.seconds, attempts: 3
# Discard after max retries
discard_on MaxRetriesExceededError
def perform(url, country: 'US', session_id: nil)
@url = url
@country = country
@session_id = session_id || SecureRandom.hex(8)
Rails.logger.info("Scraping #{url} with country=#{country}, session=#{@session_id}")
connection = ApiClient::Connection.create(country: country, session: @session_id)
response = connection.get(url) do |req|
req.headers['User-Agent'] = random_user_agent
req.headers['Accept'] = 'text/html,application/xhtml+xml'
req.options.timeout = 45
req.options.open_timeout = 15
end
if response.success?
process_response(response.body)
elsif response.status == 429
raise RateLimitError, "Rate limited on #{@url}"
elsif response.status >= 500
raise ServerError, "Server error #{response.status} on #{@url}"
else
raise HTTPError, "HTTP #{response.status} on #{@url}"
end
rescue Faraday::TimeoutError => e
raise NetworkError, "Timeout: #{e.message}"
rescue Faraday::ConnectionFailed => e
raise NetworkError, "Connection failed: #{e.message}"
end
private
def process_response(body)
# Response'u işle ve veritabanına kaydet
data = parse_response(body)
ScrapedData.create!(
url: @url,
country: @country,
session_id: @session_id,
data: data,
scraped_at: Time.current
)
end
def parse_response(body)
# Sayfa parse logic
Nokogiri::HTML(body).tap do |doc|
# Data extraction...
end
end
def random_user_agent
@random_user_agent ||= %w[
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36
].sample
end
end
# app/jobs/batch_scraping_job.rb
class BatchScrapingJob < ActiveJob::Base
queue_as :scraping
def perform(urls, country: 'US')
Rails.logger.info("Starting batch scrape of #{urls.length} URLs")
# URL'leri küçük gruplara böl ve her biri için job oluştur
urls.each_slice(50).with_index do |batch, index|
batch.each do |url|
ScrapingJob.perform_later(url, country: country, session_id: "batch-#{index}-#{SecureRandom.hex(4)}")
end
# Rate limiting için gruplar arası bekleme
sleep(2) unless index == (urls.length / 50.0).ceil - 1
end
Rails.logger.info("All #{urls.length} scraping jobs enqueued")
end
end
# Kullanım - Controller'dan
class ScraperController < ApplicationController
def create
urls = params[:urls].split('\n').map(&:strip).reject(&:blank?)
country = params[:country] || 'US'
# Batch job'ı başlat
BatchScrapingJob.perform_later(urls, country: country)
redirect_to scraper_path, notice: "Scraping started for #{urls.length} URLs"
end
end
# config/application.rb - Proxy yapılandırması
module YourApp
class Application < Rails::Application
config.proxy_hat_username = ENV['PROXYHAT_USERNAME']
config.proxy_hat_password = ENV['PROXYHAT_PASSWORD']
end
end
Proxy Türleri Karşılaştırması
| Özellik | Datacenter | Residential | Mobile |
|---|---|---|---|
| Hız | Çok Yüksek | Orta-Yüksek | Orta |
| Algılanma Riski | Yüksek | Düşük | Çok Düşük |
| Fiyat | Düşük | Orta | Yüksek |
| IP Havuzu | ~100K | ~10M+ | ~5M+ |
| Kullanım | Genel scraping | SERP, e-ticaret | Sosyal medya, ticketing |
Geo-targeting
| Sınırlı |
Ülke/Şehir |
Ülke/Şehir/Operator |
|
Key Takeaways
Ruby proxy kullanımında başarı için temel prensipler:
- Net::HTTP basit işler için yeterli, ancak production'da Typhoeus veya Faraday tercih edin
- IP rotasyonu için her istekte benzersiz session ID kullanın
- Coğrafi hedefleme ile bölgesel kısıtlamaları aşın
- Paralel istekler için Typhoeus::Hydra kullanın, max_concurrent'ı ayarlayın
- TLS/SSL yapılandırmasını production'da asla atlamayın
- Rails entegrasyonu için ActiveJob + Faraday middleware kombinasyonu ideal
- Residential proxy'ler bot tespiti riskini minimize eder
- Circuit breaker pattern ile sisteminizi koruyun
Ruby ile web scraping veya veri toplama işlemi yapıyorsanız, doğru proxy stratejisi başarıyı belirler. ProxyHat residential ve mobile proxy'leri ile yüksek başarı oranları ve düşük algılanma riski elde edebilirsiniz. Fiyatlandırma sayfamızı inceleyerek ihtiyacınıza uygun planı seçebilirsiniz.
Daha fazla bilgi için web scraping kullanım senaryomuzu ve proxy lokasyonlarımızı ziyaret edin.






