Echtzeit-Preisüberwachungs-Infrastruktur aufbauen

Entwerfen und bauen Sie ein Echtzeit-Preisüberwachungssystem mit Prioritäts-Warteschlangen, Worker-Pools, Änderungserkennung und Residential-Proxy-Rotation. Komplette Python- und Node.js-Implementierung.

Echtzeit-Preisüberwachungs-Infrastruktur aufbauen

Real-Time vs Batch Price Monitoring

Most price monitoring systems operate in batch mode: check all products every hour (or every few hours), store the results, and send alerts on changes. This works for many use cases, but in fast-moving markets — flash sales, dynamic pricing, marketplace competition — batch monitoring misses critical price changes that happen between checks.

Real-time price monitoring reduces the detection lag from hours to minutes or even seconds. Instead of checking every product on a fixed schedule, a real-time system continuously monitors high-priority targets and reacts to changes as they happen. This guide covers the architecture, proxy infrastructure, and implementation details needed to build a real-time monitoring system. For foundational price monitoring concepts, see our guide on monitoring competitor prices automatically.

AspectBatch MonitoringReal-Time Monitoring
Check frequencyEvery 1-24 hoursEvery 1-5 minutes for priority items
Detection lagUp to one full intervalUnder 5 minutes
Proxy usageConcentrated burstsSteady, distributed stream
InfrastructureSimple cron jobsEvent-driven with worker pools
CostLowerHigher (more requests, more proxies)
Best forDaily reports, trend analysisRepricing, flash sale detection, competitive bidding

Architecture for Real-Time Monitoring

A real-time price monitoring system has five core components that work together as a continuous pipeline.

1. Priority Queue

Products are assigned priority tiers that determine check frequency. A priority queue (Redis Sorted Sets work well) ensures high-value products are always checked first.

import redis
import time
import json
r = redis.Redis(host="localhost", port=6379)
def add_product(product_id, url, priority_minutes):
    """Add a product to the monitoring queue."""
    next_check = time.time()  # Check immediately on first add
    r.zadd("price_queue", {json.dumps({
        "product_id": product_id,
        "url": url,
        "interval": priority_minutes * 60,
    }): next_check})
def get_next_batch(batch_size=10):
    """Get the next batch of products due for checking."""
    now = time.time()
    items = r.zrangebyscore("price_queue", 0, now, start=0, num=batch_size)
    products = []
    for item in items:
        data = json.loads(item)
        r.zadd("price_queue", {item: now + data["interval"]})
        products.append(data)
    return products
# Example: Add products with different priorities
add_product("SKU001", "https://www.amazon.com/dp/B0CHX3QBCH", priority_minutes=2)
add_product("SKU002", "https://www.amazon.com/dp/B0D5BKRY4R", priority_minutes=5)
add_product("SKU003", "https://www.amazon.com/dp/B0CRMZHDG7", priority_minutes=15)

2. Worker Pool

Multiple worker processes pull from the priority queue, fetch prices through proxies, and push results to the data pipeline. Workers operate independently, each with its own proxy connection.

import requests
import random
from concurrent.futures import ThreadPoolExecutor
from bs4 import BeautifulSoup
PROXY_URL = "http://USERNAME:PASSWORD@gate.proxyhat.com:8080"
USER_AGENTS = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/124.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 Chrome/124.0.0.0 Safari/537.36",
]
def fetch_price(product):
    """Fetch the current price for a product."""
    headers = {
        "User-Agent": random.choice(USER_AGENTS),
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
        "Accept-Language": "en-US,en;q=0.9",
    }
    proxies = {"http": PROXY_URL, "https": PROXY_URL}
    try:
        response = requests.get(
            product["url"], headers=headers,
            proxies=proxies, timeout=30
        )
        if response.status_code == 200:
            soup = BeautifulSoup(response.text, "html.parser")
            price_el = soup.select_one("span.a-price-whole")
            if price_el:
                price = float(price_el.get_text(strip=True).replace(",", ""))
                return {
                    "product_id": product["product_id"],
                    "price": price,
                    "timestamp": time.time(),
                    "status": "success",
                }
    except Exception as e:
        pass
    return {
        "product_id": product["product_id"],
        "price": None,
        "timestamp": time.time(),
        "status": "failed",
    }
def run_workers(num_workers=10):
    """Run the monitoring worker pool."""
    with ThreadPoolExecutor(max_workers=num_workers) as executor:
        while True:
            batch = get_next_batch(batch_size=num_workers)
            if not batch:
                time.sleep(1)
                continue
            futures = [executor.submit(fetch_price, product) for product in batch]
            for future in futures:
                result = future.result()
                process_result(result)
            time.sleep(random.uniform(0.5, 2))

3. Change Detection Engine

Instead of storing every price check, the change detection engine compares current prices against the last known values and only triggers events on actual changes.

class ChangeDetector:
    def __init__(self, redis_client):
        self.redis = redis_client
    def check_change(self, product_id, new_price):
        """Compare new price against last known and detect changes."""
        key = f"last_price:{product_id}"
        last_data = self.redis.get(key)
        if last_data:
            last = json.loads(last_data)
            old_price = last["price"]
            if old_price and new_price and old_price != new_price:
                change_pct = ((new_price - old_price) / old_price) * 100
                event = {
                    "product_id": product_id,
                    "old_price": old_price,
                    "new_price": new_price,
                    "change_pct": round(change_pct, 2),
                    "timestamp": time.time(),
                }
                # Publish change event
                self.redis.publish("price_changes", json.dumps(event))
                return event
        # Update last known price
        self.redis.set(key, json.dumps({
            "price": new_price,
            "timestamp": time.time(),
        }))
        return None

4. Event Stream

Price changes are published to a Redis Pub/Sub channel (or Kafka topic for larger systems). Downstream consumers — alerting services, repricing engines, dashboards — subscribe to these events and react independently.

import redis
import json
def subscribe_to_changes():
    """Subscribe to price change events."""
    r = redis.Redis(host="localhost", port=6379)
    pubsub = r.pubsub()
    pubsub.subscribe("price_changes")
    for message in pubsub.listen():
        if message["type"] == "message":
            event = json.loads(message["data"])
            handle_price_change(event)
def handle_price_change(event):
    """Process a price change event."""
    change = event["change_pct"]
    product = event["product_id"]
    if change < -10:
        send_urgent_alert(event)  # Major price drop
    elif change < -5:
        send_alert(event)         # Moderate drop
    elif change > 10:
        send_alert(event)         # Significant increase
    # Always log to time-series database
    store_price_change(event)

5. Dashboard and Alerts

Real-time data needs real-time visualization. Use WebSocket connections to push price updates to dashboards instantly.

Node.js Implementation

A Node.js version of the real-time monitoring engine using ProxyHat's Node SDK.

const axios = require("axios");
const { HttpsProxyAgent } = require("https-proxy-agent");
const Redis = require("ioredis");
const cheerio = require("cheerio");
const PROXY_URL = "http://USERNAME:PASSWORD@gate.proxyhat.com:8080";
const redis = new Redis();
class RealTimePriceMonitor {
  constructor(concurrency = 10) {
    this.concurrency = concurrency;
    this.running = false;
    this.agent = new HttpsProxyAgent(PROXY_URL);
  }
  async fetchPrice(product) {
    try {
      const { data } = await axios.get(product.url, {
        httpsAgent: new HttpsProxyAgent(PROXY_URL),
        headers: {
          "User-Agent":
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/124.0.0.0 Safari/537.36",
          "Accept-Language": "en-US,en;q=0.9",
        },
        timeout: 30000,
      });
      const $ = cheerio.load(data);
      const priceText = $("span.a-price-whole").first().text().trim();
      const price = parseFloat(priceText.replace(/,/g, "")) || null;
      return { productId: product.productId, price, timestamp: Date.now(), status: "success" };
    } catch (err) {
      return { productId: product.productId, price: null, timestamp: Date.now(), status: "failed" };
    }
  }
  async checkChange(productId, newPrice) {
    const key = `last_price:${productId}`;
    const lastData = await redis.get(key);
    if (lastData) {
      const last = JSON.parse(lastData);
      if (last.price && newPrice && last.price !== newPrice) {
        const changePct = ((newPrice - last.price) / last.price) * 100;
        const event = {
          productId,
          oldPrice: last.price,
          newPrice,
          changePct: Math.round(changePct * 100) / 100,
          timestamp: Date.now(),
        };
        await redis.publish("price_changes", JSON.stringify(event));
        return event;
      }
    }
    await redis.set(key, JSON.stringify({ price: newPrice, timestamp: Date.now() }));
    return null;
  }
  async processProduct(product) {
    const result = await this.fetchPrice(product);
    if (result.price) {
      const change = await this.checkChange(result.productId, result.price);
      if (change) {
        console.log(
          `Price change: ${change.productId} $${change.oldPrice} -> $${change.newPrice} (${change.changePct}%)`
        );
      }
    }
    // Random delay
    await new Promise((r) => setTimeout(r, 500 + Math.random() * 1500));
  }
  async start() {
    this.running = true;
    console.log(`Starting monitor with ${this.concurrency} workers`);
    while (this.running) {
      const batch = await this.getNextBatch(this.concurrency);
      if (batch.length === 0) {
        await new Promise((r) => setTimeout(r, 1000));
        continue;
      }
      await Promise.all(batch.map((p) => this.processProduct(p)));
    }
  }
  async getNextBatch(size) {
    const now = Date.now() / 1000;
    const items = await redis.zrangebyscore("price_queue", 0, now, "LIMIT", 0, size);
    const products = [];
    for (const item of items) {
      const data = JSON.parse(item);
      await redis.zadd("price_queue", now + data.interval, item);
      products.push(data);
    }
    return products;
  }
}
const monitor = new RealTimePriceMonitor(10);
monitor.start();

Proxy Management for Continuous Monitoring

Real-time monitoring places unique demands on your proxy infrastructure compared to batch scraping.

Steady-State Request Pattern

Unlike batch scraping that sends bursts of requests, real-time monitoring creates a constant stream. This is actually better for proxy health — a steady flow of 5-10 requests per second looks more natural than 1,000 requests in a 2-minute burst.

ProxyHat Configuration for Real-Time

# Per-request rotation (default, recommended for most checks)
http://USERNAME:PASSWORD@gate.proxyhat.com:8080
# Geo-targeted for marketplace-specific monitoring
http://USERNAME-country-US:PASSWORD@gate.proxyhat.com:8080
http://USERNAME-country-DE:PASSWORD@gate.proxyhat.com:8080
# SOCKS5 for lower-level protocol control
socks5://USERNAME:PASSWORD@gate.proxyhat.com:1080

IP Health Monitoring

Track success rates per target site and adjust your approach dynamically. If success rates drop on a specific marketplace, increase delays or switch to a different geo-targeted pool. ProxyHat's large residential pool ensures you always have fresh IPs available. Check our proxy locations for full coverage.

Key takeaway: Real-time monitoring requires a steady, sustainable proxy strategy. The goal is consistent low-volume requests across many IPs, not high-volume bursts from few IPs.

Data Storage for Real-Time Data

Real-time price data needs a storage solution optimized for high-frequency inserts and time-range queries.

TimescaleDB Schema

-- TimescaleDB hypertable for price data
CREATE TABLE price_ticks (
    time        TIMESTAMPTZ NOT NULL,
    product_id  TEXT NOT NULL,
    price       DECIMAL(10,2),
    currency    VARCHAR(3) DEFAULT 'USD',
    source_url  TEXT,
    status      VARCHAR(20)
);
SELECT create_hypertable('price_ticks', 'time');
-- Continuous aggregate for hourly summaries
CREATE MATERIALIZED VIEW price_hourly
WITH (timescaledb.continuous) AS
SELECT
    time_bucket('1 hour', time) AS hour,
    product_id,
    AVG(price) AS avg_price,
    MIN(price) AS min_price,
    MAX(price) AS max_price,
    COUNT(*) AS check_count
FROM price_ticks
WHERE status = 'success'
GROUP BY hour, product_id;
-- Retention policy: keep raw ticks for 30 days
SELECT add_retention_policy('price_ticks', INTERVAL '30 days');
-- Keep hourly aggregates for 1 year
SELECT add_retention_policy('price_hourly', INTERVAL '365 days');

Scaling Considerations

  • Horizontal worker scaling: Add workers across multiple machines, each pulling from the same Redis queue. No coordination needed — the queue handles distribution.
  • Priority-based throttling: When proxy budget is limited, automatically reduce check frequency for low-priority products while maintaining real-time coverage for critical items.
  • Adaptive intervals: If a product's price has been stable for 24 hours, increase the check interval. If it changed twice in an hour, decrease it.
  • Site-specific concurrency: Different target sites have different tolerances. Run more concurrent workers for Shopify (more permissive) and fewer for Amazon (more aggressive detection).

For more on proxy strategies that support high-frequency monitoring, explore our guide on best proxies for web scraping and ProxyHat's pricing plans for high-volume usage.

Key Takeaways

  • Real-time monitoring reduces price change detection from hours to minutes, critical for repricing and competitive response.
  • Use a priority queue to focus resources on high-value products while still covering the long tail.
  • A worker pool with concurrent proxy connections provides throughput without burst patterns.
  • Change detection engines filter noise — only process and alert on actual price changes.
  • Store raw data in a time-series database (TimescaleDB) with retention policies for cost management.
  • Residential proxies with steady-state rotation are essential for continuous monitoring. Start with ProxyHat for reliable access.

Building real-time infrastructure? Read our e-commerce scraping guide for the full strategy and check our guides on using proxies in Python and Node.js for implementation details.

Bereit loszulegen?

Zugang zu über 50 Mio. Residential-IPs in über 148 Ländern mit KI-gesteuerter Filterung.

Preise ansehenResidential Proxies
← Zurück zum Blog