Page Extractor
The Page Extractor visits a list of URLs and pulls defined fields from each page in parallel. Feed it 500 product URLs and get a spreadsheet of prices, descriptions, and SKUs — without writing a single line of code.
When to Use It
- You have product detail page URLs and need the description, SKU, and price from each
- You want to pull the headline and author from 100 specific news article URLs
- You need to click "Accept Cookies" or expand a section before extracting data
Step 1 — Enter URLs
Paste your list of URLs, one per line, into the URL input area.
https://example.com/product/123
https://example.com/product/456
https://example.com/product/789
You can paste from a spreadsheet column, a text file, or your clipboard. CrawlPilot validates each URL and shows an error for any malformed entries.
[!TIP] Need to collect the URLs first? Use the List Extractor to scrape a category page, export the URL column, then paste those URLs here.
Step 2 — Define the Extraction Schema
Click Add Element for each field you want to extract.
For each element:
- 02Give it a name (e.g., "Product Title", "Price", "SKU")
- 04Click Pick on Page — a real page in your list opens in a tab so you can click the element
- 06CrawlPilot captures the CSS selector
- 08Choose the action:
- Extract — grab the text content, href, or src value of the element
- Click — click this element before extracting (use for cookie banners, "Read more" expanders, tab toggles)
Repeat for every field you need.
Step 3 — Configure Job Settings
| Setting | Default | Notes |
|---|---|---|
| Concurrent tabs | 5 | How many URLs to process simultaneously. Keep at 5 for stability; max recommended is 10. |
| Page load timeout | 5s | Seconds to wait for each page before extracting |
Step 4 — Run the Job
Click Start. The panel shows:
- Each URL's status: Queued → Extracting → Done or Error
- Overall progress: "47 / 500 complete"
- Estimated time remaining
Background tabs open and close automatically. You can continue using Chrome normally while the job runs.
Step 5 — Review Results
Click View Results when the job completes.
- Successful rows appear in the data grid
- Failed URLs are listed separately with the reason (timeout, selector not found, network error)
- Click Retry Failed to re-run only the URLs that errored
Example: Extracting Details from 100 Job Listings
Goal: Pull job title, company name, location, and salary from 100 job detail pages.
- 02Collect URLs: Use the List Extractor on a job board's search results to get all listing URLs. Export the URL column.
- 04Open Page Extractor, paste the 100 URLs.
- 06Add elements:
- "Job Title" → pick
<h1 class="job-title"> - "Company" → pick
<span class="company-name"> - "Location" → pick
<div class="location"> - "Salary" → pick
<span class="salary-range">
- 08Set concurrency to 5, timeout to 8 seconds.
- 10Start — completes in approximately 2 minutes for 100 URLs.
- 12Export CSV with all 100 rows filled in.
Handling Login-Gated Pages
CrawlPilot runs in your active Chrome session. If you are already logged into a site, the Page Extractor has access to the same pages you can view. Simply log in before starting the job.
Tab Limits and Browser Performance
Each concurrent tab consumes memory. Recommendations by machine:
| Machine | Recommended concurrent tabs |
|---|---|
| 8 GB RAM | 3–5 |
| 16 GB RAM | 5–8 |
| 32 GB RAM | 8–10 |
Close other heavy tabs before running large jobs.