How to get product data from any website
Four ways to extract product data from e-commerce sites — from manual copy-paste to APIs — with code examples and honest tradeoffs for each.
You need product data from a website. Maybe you're building a price comparison tool, populating a catalogue, or doing competitor research. Whatever the reason, you're staring at a product page and thinking "I just want the title, price, and image in a format I can actually use."
There are a few ways to get there. The right choice depends on how many products you need and how often you need them.
- Option 1: Manual copy-paste
- Option 2: Browser extensions
- Option 3: Write your own scraper
- Option 4: Use a product data API
- What about ongoing monitoring?
- Which approach should you pick?
Option 1: Manual copy-paste
The most obvious approach. Go to the product page, highlight the title, copy it, paste it into a spreadsheet. Do the same for the price. Right-click the image, save it. Repeat.
This works if you need data from five products. It does not work if you need data from five hundred. I've watched people try to manually collect data from a few dozen product pages and it takes hours. You make mistakes. You miss fields. You go slightly insane.
The other problem: it's a snapshot. If prices change tomorrow, you get to do it all over again.
Manual copy-paste is fine for one-off research with a small number of products. For anything else, keep reading.
Option 2: Browser extensions
There are Chrome extensions that let you click on elements on a page and extract them into a spreadsheet. Tools like Web Scraper, Data Miner, or Instant Data Scraper. You define a template — "the title is in this CSS selector, the price is in that one" — and the extension pulls the data for you.
This is a step up from copy-paste. You can scrape multiple product pages with one template, and some extensions handle pagination. For a one-time data pull from a single site, they're genuinely useful.
The limits show up fast, though. Each site needs its own template. The extensions can't handle JavaScript-heavy pages well. There's no scheduling — you have to sit there with the browser open while it runs. And if the site changes its layout, your template breaks.
Good for: grabbing a few hundred products from one site, once. Not great for anything ongoing or multi-site.
Option 3: Write your own scraper
This is where most developers start. Python with Beautiful Soup or Scrapy, maybe Puppeteer if you need JavaScript rendering. You write a script that fetches the page HTML, parses it, and extracts the fields you care about.
Here's a basic example:
import requests
from bs4 import BeautifulSoup
def scrape_product(url):
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, "html.parser")
return {
"title": soup.select_one("h1.product-title").text.strip(),
"price": soup.select_one("span.price").text.strip(),
"image": soup.select_one("img.product-image")["src"],
}
product = scrape_product("https://example-store.com/products/widget")
print(product)
Twelve lines. Clean. Satisfying. And it works — until it doesn't.
The selectors above are specific to one site's HTML structure. A different store uses different class names, different element hierarchy, sometimes entirely different rendering approaches. Amazon loads half its product data via JavaScript after the initial page load, so requests.get won't even see it. You'll need a headless browser for that.
Then there's the maintenance. Sites change their markup. Anti-bot systems block your requests. You need proxies. You need CAPTCHA solving. You need retry logic. What started as a 12-line script turns into a small infrastructure project.
If this sounds like something you want to dig into, we wrote a detailed comparison of scraping vs. using an API that covers the real costs and tradeoffs. Building your own scraper is a great learning project and sometimes the right call — but go in with your eyes open about the maintenance burden.
Option 4: Use a product data API
This is the approach where you send a URL and get back structured JSON. No parsing HTML, no dealing with anti-bot systems, no maintaining selectors for every store.
Here's what it looks like with Product Scrapes:
curl -X POST https://productscrapes.com/api/fetch \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://www.amazon.com/dp/B0EXAMPLE"}'
And you get back:
{
"data": {
"product": {
"title": "Wireless Bluetooth Headphones",
"description": "Premium noise-cancelling headphones...",
"price": "79.99",
"currency": "USD",
"image_url": "https://example.com/image.jpg",
"brand": "AudioTech",
"sku": "AT-WBH-100",
"in_stock": true
}
}
}
Same response shape whether the URL points to Amazon, a Shopify store, Target, or thousands of other sites. You don't write a different parser for each one. The API handles proxy rotation, JavaScript rendering, anti-bot bypass, and normalising the output.
The tradeoff is cost — you're paying per request instead of just paying for infrastructure. But when you factor in the engineering time you'd spend maintaining scrapers, it's usually cheaper. The exception is if you're scraping one stable site and nothing else.
If you need data from multiple retailers, this is the fastest way to get from "I need this data" to actually having it.
What about ongoing monitoring?
Everything above assumes you want data right now, once. But a lot of use cases need fresh data on a schedule — price tracking, stock monitoring, keeping tabs on competitors.
We built a monitors feature for this. You create a monitor with a URL and a frequency, and we check the product on that schedule and send the data to your webhook. No cron jobs, no database, no infrastructure on your end. Frequencies range from hourly to monthly.
Worth knowing about even if you start with one-off API calls. Most people eventually want ongoing data.
Which approach should you pick?
Be honest about what you actually need:
- Under 20 products, one time? Just copy-paste. Seriously. Don't over-engineer it.
- A few hundred products from one site, one time? A browser extension will save you time.
- Custom data needs from one or two sites, and you like writing code? Build a scraper. You'll learn a lot.
- Multiple retailers, clean data, no maintenance headaches? An API is the right tool.
Most people reading this probably fall into that last bucket, which is why we built Product Scrapes. But I'm not going to pretend it's the right answer for everyone. If you need data from exactly one Shopify store and you know Python, a custom scraper might serve you just fine.
If you want to try the API approach, our getting started guide walks through the whole thing from signup to first request in about five minutes.