Why Scrape Etsy? The API-vs-HTML Trade-Off
Etsy has no official public API for marketplace data. Their Open API v3 is designed for shop owners managing their own listings—not for researchers pulling competitor data at scale. That leaves HTML scraping as the only realistic path for niche discovery, price monitoring, and POD research.
The trade-off is real: you're parsing rendered HTML instead of clean JSON. But Etsy's page structure is surprisingly consistent, and with the right selectors and proxy strategy, you can extract search results, listing details, and shop analytics at scale. This guide shows you exactly how—pragmatically, ethically, and with production-ready code.
Etsy's Page Structure: What You're Scraping
Before writing a single line of code, understand the four page types you'll hit and the data each one holds.
Search Results Pages
URL pattern: https://www.etsy.com/search?q=QUERY&ref=search_bar
Each search results page renders up to 48 listing cards. Key data points per card:
- Listing title — inside
<h3>with classv2listing-card__info__title - Price —
span.currency-valuefor the numeric part,span.currency-symbolfor currency - Shop name —
a.shop-nameor within the card's subtitle area - Listing URL —
a.listing-linkwith anhreflike/listing/123456789/title-slug - Star seller badge, free shipping, ad badge — various indicator elements
Pagination is cursor-based: add &page=2, &page=3, etc. Etsy caps visible results around ~250 pages for most queries.
Listing Detail Pages
URL pattern: https://www.etsy.com/listing/ID/title-slug
Rich data available here:
- Full description —
div#product-description-content - Price with variants —
div[data-selector="price-varies"] - Shipping cost —
div#shipping-varies-message - Number of favorites —
a[data-action="add-to-favorites"]sibling text - Reviews snippet —
div.review-list - Shop link —
a.shop-namein the sidebar
Shop Pages
URL pattern: https://www.etsy.com/shop/SHOPNAME
Shop pages expose:
- Sales count — text like "5,280 sales" in
span.shop-sales - Listing count — items listed count near the top
- Star seller status — badge element
- Review average and count — star rating + review count in the sidebar
Category Trees
Etsy's categories live at https://www.etsy.com/c/CATEGORY. Sub-categories are nested in the left sidebar navigation. You can walk the tree by following a.sidebar-category-link elements recursively. For POD research, the key categories are under /c/clothing, /c/accessories, /c/home-and-living, and /c/art-and-collectibles.
Etsy's Anti-Bot Defenses: Cloudflare + Rate Limits
Etsy runs Cloudflare on the edge. If you fire requests from a datacenter IP at any reasonable volume, you'll hit Cloudflare's challenge page (HTTP 403 with a JS challenge). This isn't a CAPTCHA you can solve—it's a browser fingerprint check that rejects non-browser traffic patterns.
On top of Cloudflare, Etsy applies internal rate limits:
- ~60 requests/minute from a single IP before you see soft blocks (HTTP 429 or redirect to a challenge page)
- ~200 requests/minute triggers a harder block that may require a CAPTCHA solve to lift
- Search pages are more aggressively rate-limited than listing detail pages
This is why residential proxies are strongly recommended for Etsy scraping. Datacenter IPs are flagged quickly. Mobile proxies work too but are slower and more expensive for this use case. Residential IPs blend with normal shopper traffic and distribute your requests across thousands of real user IPs.
| Proxy Type | Etsy Compatibility | Speed | Cost | Best For |
|---|---|---|---|---|
| Datacenter | Low — blocked fast | Fast | Low | Testing only |
| Residential (rotating) | High — looks like real shoppers | Medium | Medium | Search + listing scraping at scale |
| Residential (sticky session) | High — consistent IP per session | Medium | Medium | Multi-page flows (search → detail → shop) |
| Mobile | Very high — highest trust score | Slow | High | Bypassing aggressive blocks |
Scraping Patterns for Niche Discovery
For POD and niche research, you're not scraping individual listings for the sake of it—you're trying to answer these questions:
- What's trending? — Which search terms return the most new listings?
- How competitive is a niche? — How many unique sellers appear in search results?
- What's the price ceiling? — What's the average and 90th-percentile price?
Trending Search Terms
Etsy's autocomplete endpoint is a goldmine. Hit https://www.etsy.com/api/v3/ajax/member/suggestions?query=KEYWORD (no auth required for public suggestions) and parse the JSON response. Each suggestion comes with a rough result count.
Alternatively, scrape the "Trending now" section on Etsy's homepage or the "Related searches" bar at the top of search results pages.
Seller Count Per Niche
For a given search query, paginate through results and collect unique shop names. The count of distinct shops is your competition metric. A niche with 500 results but only 20 sellers is less competitive than one with 500 results from 400 sellers.
Average Price Points
Parse the price from each listing card on search results pages. You don't need detail pages for this—card-level prices are sufficient for distribution analysis. Compute mean, median, and 90th percentile to understand where you can price your POD products.
Python Example: Search → Listing Cards → Detail Pages
Here's a complete, production-style pipeline that fetches Etsy search results, parses listing cards, then hits detail pages with residential proxy rotation.
Step 1: Set Up the Residential Proxy Session
import requests
from urllib.parse import quote
import time
import random
PROXY_USER = "your_user"
PROXY_PASS = "your_pass"
PROXY_GATE = "gate.proxyhat.com:8080"
# Rotate IP per request (country-US for US Etsy results)
def get_proxy_url(session_id=None):
if session_id:
# Sticky session — same IP for multi-page flows
user = f"{PROXY_USER}-country-US-session-{session_id}"
else:
# Rotating — new IP per request
user = f"{PROXY_USER}-country-US"
return f"http://{user}:{PROXY_PASS}@{PROXY_GATE}"
proxies = {
"http": get_proxy_url(),
"https": get_proxy_url(),
}
headers = {
"User-Agent": (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/125.0.0.0 Safari/537.36"
),
"Accept": "text/html,application/xhtml+xml",
"Accept-Language": "en-US,en;q=0.9",
}
Step 2: Fetch and Parse Search Results
from lxml import html
def fetch_search_results(query, page=1):
url = f"https://www.etsy.com/search?q={quote(query)}&page={page}"
# Use rotating proxy for each search page request
proxies = {
"http": get_proxy_url(),
"https": get_proxy_url(),
}
resp = requests.get(url, headers=headers, proxies=proxies, timeout=30)
resp.raise_for_status()
tree = html.fromstring(resp.text)
listings = []
cards = tree.xpath('//div[contains(@class, "v2listing-card")]')
for card in cards:
title_el = card.xpath('.//h3[contains(@class, "v2listing-card__info__title")]')
price_el = card.xpath('.//span[@class="currency-value"]')
link_el = card.xpath('.//a[contains(@class, "listing-link")]/@href')
shop_el = card.xpath('.//a[contains(@class, "shop-name")]//text()')
title = title_el[0].text_content().strip() if title_el else None
price = price_el[0].text_content().strip() if price_el else None
link = link_el[0] if link_el else None
shop = shop_el[0].strip() if shop_el else None
if title and link:
listings.append({
"title": title,
"price": float(price.replace(",", "")) if price else None,
"url": link,
"shop": shop,
})
return listings
# Fetch first 3 pages for "funny coffee mug"
all_listings = []
for page in range(1, 4):
listings = fetch_search_results("funny coffee mug", page=page)
all_listings.extend(listings)
time.sleep(random.uniform(3, 7)) # polite delay between pages
print(f"Page {page}: {len(listings)} listings")
print(f"Total: {len(all_listings)} listings from {len(set(l['shop'] for l in all_listings if l['shop']))} shops")
Step 3: Scrape Listing Detail Pages with Sticky Sessions
When you click from search to a detail page, a real browser keeps the same IP. Mimic this with sticky proxy sessions.
def fetch_listing_detail(listing_url, session_id):
# Sticky session keeps same IP — mimics a real user browsing
proxy_url = get_proxy_url(session_id=session_id)
proxies = {"http": proxy_url, "https": proxy_url}
resp = requests.get(listing_url, headers=headers, proxies=proxies, timeout=30)
resp.raise_for_status()
tree = html.fromstring(resp.text)
description_el = tree.xpath('//div[@id="product-description-content"]')
favorites_el = tree.xpath('//a[@data-action="add-to-favorites"]/following-sibling::span/text()')
review_count_el = tree.xpath('//span[contains(@class, "review-count")]//text()')
return {
"description": description_el[0].text_content().strip()[:500] if description_el else None,
"favorites": favorites_el[0].strip() if favorites_el else None,
"review_count": review_count_el[0].strip() if review_count_el else None,
}
# Process top 10 listings with sticky sessions
for i, listing in enumerate(all_listings[:10]):
session_id = f"etsy-browse-{i}"
detail = fetch_listing_detail(listing["url"], session_id)
listing.update(detail)
time.sleep(random.uniform(4, 8))
print(f"{listing['title'][:50]}... — {detail.get('favorites', 'N/A')} favs")
Step 4: Compute Niche Metrics
import statistics
prices = [l["price"] for l in all_listings if l["price"]]
shops = set(l["shop"] for l in all_listings if l["shop"])
niche_report = {
"query": "funny coffee mug",
"total_listings_scraped": len(all_listings),
"unique_shops": len(shops),
"avg_price": round(statistics.mean(prices), 2) if prices else 0,
"median_price": round(statistics.median(prices), 2) if prices else 0,
"p90_price": round(sorted(prices)[int(len(prices) * 0.9)], 2) if prices else 0,
"min_price": min(prices) if prices else 0,
"max_price": max(prices) if prices else 0,
"competition_ratio": round(len(shops) / len(all_listings), 2) if all_listings else 0,
}
print(niche_report)
# Example output:
# {
# 'query': 'funny coffee mug',
# 'total_listings_scraped': 144,
# 'unique_shops': 98,
# 'avg_price': 15.73,
# 'median_price': 14.99,
# 'p90_price': 22.00,
# 'min_price': 5.99,
# 'max_price': 38.00,
# 'competition_ratio': 0.68
# }
Shop Analytics: Sales, Listings, and Reviews
For competitive analysis, you want to know how established a shop is. Etsy makes this surprisingly accessible.
Scraping the Sales Badge
Etsy displays a rough sales count on every shop page as text like "5,280 sales". This is in span.shop-sales or similar. It's rounded and not real-time, but it's the best publicly available metric.
def fetch_shop_analytics(shop_name):
url = f"https://www.etsy.com/shop/{shop_name}"
# Use a fresh rotating IP for each shop lookup
proxies = {"http": get_proxy_url(), "https": get_proxy_url()}
resp = requests.get(url, headers=headers, proxies=proxies, timeout=30)
tree = html.fromstring(resp.text)
sales_el = tree.xpath('//span[contains(@class, "shop-sales")]//text()')
listing_count_el = tree.xpath('//span[contains(@class, "listing-count")]//text()')
rating_el = tree.xpath('//span[contains(@class, "review-stars")]//text()')
review_count_el = tree.xpath('//span[contains(@class, "review-count")]//text()')
def parse_sales(text):
# "5,280 sales" → 5280
digits = "".join(c for c in text if c.isdigit())
return int(digits) if digits else 0
sales_text = sales_el[0] if sales_el else "0"
return {
"shop": shop_name,
"sales": parse_sales(sales_text),
"listing_count": listing_count_el[0].strip() if listing_count_el else None,
"rating": rating_el[0].strip() if rating_el else None,
"review_count": review_count_el[0].strip() if review_count_el else None,
}
# Analyze top shops in the niche
top_shops = sorted(shops)[:5]
for shop in top_shops:
analytics = fetch_shop_analytics(shop)
print(f"{shop}: {analytics['sales']} sales, {analytics['listing_count']} listings")
time.sleep(random.uniform(5, 10))
What You Can and Can't Get from Shop Pages
| Data Point | Available? | Source | Fidelity |
|---|---|---|---|
| Rough sales count | Yes | "x sales" badge | Rounded, not real-time |
| Exact listing count | Yes | Shop header | Accurate |
| Review average | Yes | Star rating display | Accurate |
| Review count | Yes | Review count text | Accurate |
| Revenue estimate | No (derived) | Sales × avg price | Rough approximation only |
| Individual order data | No | Not public | N/A |
Etsy Autocomplete for Keyword Expansion
One of the highest-value, lowest-effort scraping targets is Etsy's search autocomplete. No proxy rotation needed for occasional use—just rate-limit yourself.
def etsy_autocomplete(query):
url = f"https://www.etsy.com/api/v3/ajax/member/suggestions?query={quote(query)}"
resp = requests.get(url, headers={"Accept": "application/json", **headers}, timeout=15)
return resp.json().get("results", [])
suggestions = etsy_autocomplete("coffee mug")
for s in suggestions[:10]:
print(s.get("query", s.get("name", "")))
# funny coffee mug, coffee mug funny, personalized coffee mug, ...
Use this to expand your keyword list before running full search-result scrapes. It's faster and lighter than paginating through search pages.
Ethical Considerations: These Are Small Businesses
This is important. Etsy sellers are overwhelmingly independent creators and small businesses—many running POD operations just like you. When you scrape Etsy for research:
- Scrape for market intelligence, not to copy. Understanding price points, keyword demand, and competition levels is legitimate research. Downloading original designs or copying listing copy verbatim is theft.
- Respect rate limits. Aggressive scraping can degrade Etsy's performance for real shoppers and sellers. Use delays, respect
robots.txt, and keep your request volume reasonable. - Don't resell scraped data as-is. Transform the data into insights—niche reports, price distributions, competition scores—rather than republishing raw listings.
- Comply with Etsy's Terms of Service. Scraping violates Etsy's ToS. Accept the risk, be prepared for IP blocks, and don't use scraped data in ways that harm individual sellers.
Scrape Etsy to understand the market, not to rip off the people who built it. Your POD business should compete on originality and quality—not on how well you can clone someone else's work.
Practical Tips for Reliable Etsy Scraping
- Use residential proxies with US geo-targeting — Etsy serves different results by region.
user-country-USin your ProxyHat username ensures you see the US marketplace. - Sticky sessions for multi-page flows — When scraping search → detail → shop in sequence, use
user-session-abc123to keep the same IP. This mimics real user behavior and avoids Cloudflare triggers. - Randomize delays — Use
random.uniform(3, 8)between requests, not fixed intervals. Fixed intervals are trivially detectable. - Rotate User-Agent strings — Don't send the same UA for every request. Rotate from a pool of current Chrome/Firefox UAs.
- Handle pagination gracefully — Etsy may return empty results before the last page. Check for zero listings and stop early.
- Cache aggressively — Store raw HTML responses. Re-parsing is cheap; re-scraping is expensive and risky.
Key Takeaways
- Etsy has no public data API—HTML scraping is the only realistic option for niche research at scale.
- Cloudflare and internal rate limits (~60 req/min per IP) make residential proxies essential. Datacenter IPs get blocked quickly.
- Search result pages give you titles, prices, shop names, and listing URLs—enough for competition and price analysis without hitting detail pages.
- Use sticky proxy sessions for multi-page flows (search → detail → shop) to mimic real browsing behavior.
- Etsy's "x sales" badge on shop pages is your best publicly available metric for estimating a competitor's traction.
- Scrape ethically—use data for market research, not to copy designs or listing copy from small sellers.
Ready to start your Etsy niche research? ProxyHat's residential proxies give you access to real US residential IPs with geo-targeting and sticky sessions—exactly what you need to scrape Etsy reliably. Check out our web scraping use case for more proxy strategies, or explore available proxy locations to target specific regions.






