Why Scrape Shopify Stores?
Shopify powers over 4 million online stores worldwide, from small independent brands to major retailers. This makes it one of the richest sources of e-commerce intelligence. By scraping Shopify stores, you can track competitor pricing, monitor product launches, analyze market trends, and build comprehensive product databases.
The good news is that Shopify has a predictable structure that makes scraping more systematic than most e-commerce platforms. Every Shopify store exposes certain data through standardized endpoints, which means a single scraper architecture can work across thousands of different stores. For a broader overview of e-commerce scraping strategies, see our e-commerce data scraping guide.
Understanding Shopify's Store Structure
Every Shopify store follows the same URL and data patterns, regardless of the theme or customization.
Public JSON Endpoints
Shopify exposes product data through JSON endpoints that do not require authentication. These are the most efficient way to scrape Shopify stores because you get structured data without HTML parsing.
| Endpoint | Data Returned | Pagination |
|---|---|---|
/products.json | All products with variants, prices, images | ?page=N&limit=250 |
/products/{handle}.json | Single product detail | N/A |
/collections.json | All collections | ?page=N |
/collections/{handle}/products.json | Products in a collection | ?page=N&limit=250 |
/meta.json | Store metadata (name, description) | N/A |
Product Data Structure
Each product object from the JSON API includes:
- Basic info: title, handle (slug), body_html (description), vendor, product_type, tags
- Variants: Each variant has its own price, compare_at_price, SKU, inventory status, and option values (size, color, etc.)
- Images: URLs for all product images with alt text
- Dates: created_at, updated_at, published_at
Rate Limiting
Shopify applies rate limits to protect store performance. The public JSON endpoints typically allow 2-4 requests per second per IP before throttling kicks in. This is where residential proxies become essential — spreading requests across multiple IPs lets you maintain throughput without hitting rate limits on any single IP.
Proxy Configuration for Shopify
Shopify's rate limiting is IP-based, making proxy rotation the primary strategy for scraping at scale.
ProxyHat Setup
# Rotating residential proxy (new IP per request)
http://USERNAME:PASSWORD@gate.proxyhat.com:8080
# Geo-targeted for region-specific stores
http://USERNAME-country-US:PASSWORD@gate.proxyhat.com:8080
# Sticky session for paginated scraping of one store
http://USERNAME-session-shopify001:PASSWORD@gate.proxyhat.com:8080
For Shopify scraping, use per-request rotation when scraping different stores, and sticky sessions when paginating through a single store's product catalog. This pattern mimics natural browsing behavior.
Python Implementation
Here is a production-ready Shopify scraper using ProxyHat's Python SDK.
JSON API Scraper
import requests
import json
import time
import random
from dataclasses import dataclass, field
from typing import Optional
PROXY_URL = "http://USERNAME:PASSWORD@gate.proxyhat.com:8080"
USER_AGENTS = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/124.0.0.0 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 Chrome/124.0.0.0 Safari/537.36",
]
@dataclass
class ShopifyProduct:
id: int
title: str
handle: str
vendor: str
product_type: str
tags: list[str]
variants: list[dict]
images: list[str]
min_price: float
max_price: float
created_at: str
updated_at: str
def get_session(store_domain: str) -> requests.Session:
"""Create a session with proxy and headers configured."""
session = requests.Session()
session.proxies = {"http": PROXY_URL, "https": PROXY_URL}
session.headers.update({
"User-Agent": random.choice(USER_AGENTS),
"Accept": "application/json",
"Accept-Language": "en-US,en;q=0.9",
})
return session
def scrape_all_products(store_domain: str) -> list[ShopifyProduct]:
"""Scrape all products from a Shopify store via JSON API."""
products = []
page = 1
session = get_session(store_domain)
while True:
url = f"https://{store_domain}/products.json?page={page}&limit=250"
try:
response = session.get(url, timeout=30)
response.raise_for_status()
except requests.RequestException as e:
print(f"Error on page {page}: {e}")
break
data = response.json()
page_products = data.get("products", [])
if not page_products:
break
for p in page_products:
prices = [float(v["price"]) for v in p.get("variants", [])
if v.get("price")]
product = ShopifyProduct(
id=p["id"],
title=p["title"],
handle=p["handle"],
vendor=p.get("vendor", ""),
product_type=p.get("product_type", ""),
tags=p.get("tags", "").split(", ") if p.get("tags") else [],
variants=[{
"id": v["id"],
"title": v["title"],
"price": v["price"],
"compare_at_price": v.get("compare_at_price"),
"sku": v.get("sku"),
"available": v.get("available", False),
} for v in p.get("variants", [])],
images=[img["src"] for img in p.get("images", [])],
min_price=min(prices) if prices else 0,
max_price=max(prices) if prices else 0,
created_at=p.get("created_at", ""),
updated_at=p.get("updated_at", ""),
)
products.append(product)
print(f"Page {page}: {len(page_products)} products (total: {len(products)})")
page += 1
time.sleep(random.uniform(1, 3))
return products
def scrape_collections(store_domain: str) -> list[dict]:
"""Scrape all collections from a Shopify store."""
collections = []
page = 1
session = get_session(store_domain)
while True:
url = f"https://{store_domain}/collections.json?page={page}"
try:
response = session.get(url, timeout=30)
response.raise_for_status()
except requests.RequestException:
break
data = response.json()
page_collections = data.get("collections", [])
if not page_collections:
break
collections.extend(page_collections)
page += 1
time.sleep(random.uniform(1, 2))
return collections
# Example: Scrape multiple Shopify stores
if __name__ == "__main__":
stores = [
"example-store-1.myshopify.com",
"example-store-2.com",
"example-store-3.com",
]
for store in stores:
print(f"\nScraping: {store}")
products = scrape_all_products(store)
print(f"Found {len(products)} products")
# Save to JSON
with open(f"{store.replace('.', '_')}_products.json", "w") as f:
json.dump([vars(p) for p in products], f, indent=2)
time.sleep(random.uniform(3, 7))
Monitoring Price Changes Across Stores
def compare_prices(store_domain: str, previous_data: dict) -> list[dict]:
"""Compare current prices with previously stored data."""
changes = []
products = scrape_all_products(store_domain)
for product in products:
prev = previous_data.get(product.handle)
if not prev:
changes.append({
"type": "new_product",
"handle": product.handle,
"title": product.title,
"price": product.min_price,
})
continue
if product.min_price != prev.get("min_price"):
changes.append({
"type": "price_change",
"handle": product.handle,
"title": product.title,
"old_price": prev["min_price"],
"new_price": product.min_price,
"change_pct": ((product.min_price - prev["min_price"])
/ prev["min_price"] * 100)
if prev["min_price"] else 0,
})
return changes
Node.js Implementation
A Node.js version using ProxyHat's Node SDK.
const axios = require("axios");
const { HttpsProxyAgent } = require("https-proxy-agent");
const fs = require("fs");
const PROXY_URL = "http://USERNAME:PASSWORD@gate.proxyhat.com:8080";
const agent = new HttpsProxyAgent(PROXY_URL);
async function scrapeShopifyProducts(storeDomain) {
const products = [];
let page = 1;
while (true) {
const url = `https://${storeDomain}/products.json?page=${page}&limit=250`;
try {
const { data } = await axios.get(url, {
httpsAgent: agent,
headers: {
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/124.0.0.0 Safari/537.36",
Accept: "application/json",
},
timeout: 30000,
});
const pageProducts = data.products || [];
if (pageProducts.length === 0) break;
for (const p of pageProducts) {
const prices = p.variants.map((v) => parseFloat(v.price)).filter(Boolean);
products.push({
id: p.id,
title: p.title,
handle: p.handle,
vendor: p.vendor,
productType: p.product_type,
tags: p.tags ? p.tags.split(", ") : [],
minPrice: Math.min(...prices),
maxPrice: Math.max(...prices),
variants: p.variants.map((v) => ({
id: v.id,
title: v.title,
price: v.price,
compareAtPrice: v.compare_at_price,
sku: v.sku,
available: v.available,
})),
images: p.images.map((img) => img.src),
updatedAt: p.updated_at,
});
}
console.log(`Page ${page}: ${pageProducts.length} products (total: ${products.length})`);
page++;
// Random delay 1-3 seconds
await new Promise((r) => setTimeout(r, 1000 + Math.random() * 2000));
} catch (err) {
console.error(`Error on page ${page}: ${err.message}`);
break;
}
}
return products;
}
async function scrapeMultipleStores(stores) {
const results = {};
for (const store of stores) {
console.log(`\nScraping: ${store}`);
const products = await scrapeShopifyProducts(store);
results[store] = products;
console.log(`Found ${products.length} products`);
// Delay between stores
await new Promise((r) => setTimeout(r, 3000 + Math.random() * 4000));
}
return results;
}
// Usage
scrapeMultipleStores([
"example-store-1.myshopify.com",
"example-store-2.com",
]).then((results) => {
fs.writeFileSync("shopify_data.json", JSON.stringify(results, null, 2));
console.log("Data saved to shopify_data.json");
});
Shopify-Specific Scraping Strategies
Discovering Shopify Stores
Before scraping, you need to identify which competitor sites run on Shopify. Common indicators include:
- The
/products.jsonendpoint returns valid JSON - HTML source contains
Shopify.themeorcdn.shopify.com - The
x-shopify-stageheader is present in responses
Handling Passworded Stores
Some Shopify stores require a password to access. These are typically pre-launch or wholesale stores. The JSON endpoints will return a redirect to the password page. Skip these stores in your scraping pipeline unless you have authorized access.
Dealing with Custom Domains
Shopify stores often use custom domains instead of .myshopify.com. The JSON API works the same way on custom domains. Just use the store's public-facing domain in your requests.
Inventory Tracking
Product variants include an available field that indicates stock status. By tracking this field over time, you can monitor competitor inventory levels and identify when products go out of stock — useful intelligence for pricing and restocking decisions.
Avoiding Blocks and Rate Limits
While Shopify is more scraper-friendly than Amazon, it still enforces protections.
| Protection | Details | Mitigation |
|---|---|---|
| IP Rate Limiting | ~2-4 req/sec per IP for JSON endpoints | Rotate residential proxies across requests |
| Cloudflare Protection | Some stores use Cloudflare | Residential IPs with browser-like headers |
| Bot Detection | Behavioral patterns monitored | Randomize delays and User-Agents |
| Password Pages | Pre-launch/wholesale stores locked | Skip or use authorized access |
For more on handling anti-bot systems, read our guide on how to scrape websites without getting blocked.
Key takeaway: Shopify's JSON API is the most efficient scraping approach — it gives you structured data without HTML parsing. Use it before falling back to HTML scraping.
Data Use Cases
Once you have collected Shopify product data, here are the most valuable applications:
- Competitive pricing: Track competitor prices across product categories and adjust your pricing strategy in real time.
- Product research: Identify trending products, new launches, and market gaps by monitoring multiple stores.
- Market analysis: Aggregate data across hundreds of Shopify stores to understand market trends, pricing distribution, and category growth.
- Catalog enrichment: Use competitor product descriptions, images, and specifications to improve your own listings.
- Brand monitoring: Track unauthorized sellers of your products and monitor MAP compliance across Shopify storefronts.
Key Takeaways
- Shopify's
/products.jsonendpoint is the most efficient scraping method — use it before HTML parsing. - A single scraper architecture works across all Shopify stores due to the standardized structure.
- Residential proxies with rotation overcome Shopify's IP-based rate limiting.
- Use sticky sessions when paginating through a single store's catalog.
- Track variant-level pricing and availability for comprehensive competitive intelligence.
- Start with ProxyHat's residential proxies to scale your Shopify scraping reliably.
Ready to start scraping Shopify stores? Explore our e-commerce data scraping guide for the full strategy, and check our Python proxy guide and Node.js proxy guide for implementation details. Visit our pricing page to get started.






