Page Extractor

The Page Extractor visits a list of URLs and pulls defined fields from each page in parallel. Feed it 500 product URLs and get a spreadsheet of prices, descriptions, and SKUs — without writing a single line of code.

When to Use It

  • You have product detail page URLs and need the description, SKU, and price from each
  • You want to pull the headline and author from 100 specific news article URLs
  • You need to click "Accept Cookies" or expand a section before extracting data

Step 1 — Enter URLs

Paste your list of URLs, one per line, into the URL input area.

https://example.com/product/123
https://example.com/product/456
https://example.com/product/789

You can paste from a spreadsheet column, a text file, or your clipboard. CrawlPilot validates each URL and shows an error for any malformed entries.

[!TIP] Need to collect the URLs first? Use the List Extractor to scrape a category page, export the URL column, then paste those URLs here.

Step 2 — Define the Extraction Schema

Click Add Element for each field you want to extract.

For each element:

  1. 02
    Give it a name (e.g., "Product Title", "Price", "SKU")
  2. 04
    Click Pick on Page — a real page in your list opens in a tab so you can click the element
  3. 06
    CrawlPilot captures the CSS selector
  4. 08
    Choose the action:
    • Extract — grab the text content, href, or src value of the element
    • Click — click this element before extracting (use for cookie banners, "Read more" expanders, tab toggles)

Repeat for every field you need.

Step 3 — Configure Job Settings

SettingDefaultNotes
Concurrent tabs5How many URLs to process simultaneously. Keep at 5 for stability; max recommended is 10.
Page load timeout5sSeconds to wait for each page before extracting
Warning
Setting concurrent tabs above 10 may cause Chrome to throttle or crash tabs on slower machines.

Step 4 — Run the Job

Click Start. The panel shows:

  • Each URL's status: Queued → Extracting → Done or Error
  • Overall progress: "47 / 500 complete"
  • Estimated time remaining

Background tabs open and close automatically. You can continue using Chrome normally while the job runs.

Step 5 — Review Results

Click View Results when the job completes.

  • Successful rows appear in the data grid
  • Failed URLs are listed separately with the reason (timeout, selector not found, network error)
  • Click Retry Failed to re-run only the URLs that errored

Example: Extracting Details from 100 Job Listings

Goal: Pull job title, company name, location, and salary from 100 job detail pages.

  1. 02
    Collect URLs: Use the List Extractor on a job board's search results to get all listing URLs. Export the URL column.
  2. 04
    Open Page Extractor, paste the 100 URLs.
  3. 06
    Add elements:
    • "Job Title" → pick <h1 class="job-title">
    • "Company" → pick <span class="company-name">
    • "Location" → pick <div class="location">
    • "Salary" → pick <span class="salary-range">
  4. 08
    Set concurrency to 5, timeout to 8 seconds.
  5. 10
    Start — completes in approximately 2 minutes for 100 URLs.
  6. 12
    Export CSV with all 100 rows filled in.

Handling Login-Gated Pages

CrawlPilot runs in your active Chrome session. If you are already logged into a site, the Page Extractor has access to the same pages you can view. Simply log in before starting the job.

Tab Limits and Browser Performance

Each concurrent tab consumes memory. Recommendations by machine:

MachineRecommended concurrent tabs
8 GB RAM3–5
16 GB RAM5–8
32 GB RAM8–10

Close other heavy tabs before running large jobs.