List Extractor
The List Extractor pulls structured data from pages that display repeated items — product grids, article feeds, job listings, directory tables, and anything with a consistent repeating pattern.
It supports single-page extraction as well as multi-page collection via infinite scroll, "Load More" buttons, and classic next-page pagination.
When to Use It
- A page shows 50 products and you want all their names, prices, and URLs
- A blog lists 200 articles with titles and dates across 10 pages
- A job board shows listings with infinite scroll
- A directory table has rows you want as spreadsheet data
The 3-Step Wizard
Step 1 — Pick the Container
The container is the parent element that wraps all your list items.
- 02Click Pick Container.
- 04Hover over the page — elements highlight as you move.
- 06Find the element that surrounds all items (the grid wrapper, the
<ul>, the<div class="results">). When you see all items grouped with a green outline, you're at the right level. - 08Click to confirm.
CrawlPilot shows the detected CSS selector and the number of items found on the current page.
[!TIP] Use the ↑ Up and ↓ Down arrows in the picker to navigate up or down the DOM tree. Siblings highlight green to confirm how many items are captured at each level.
Step 2 — Pick One Item
- 02Click Pick Item.
- 04Hover over a single repeating unit (one product card, one article row, one table row).
- 06Click to confirm.
CrawlPilot detects all sibling items matching the same pattern and shows the count.
Step 3 — Review Schema
CrawlPilot analyzes the selected item and auto-generates columns for the fields it detects. Common auto-detected types:
| Type | Detected from | Example value |
|---|---|---|
| Title | Heading elements, bold text | "Running Shoes v2" |
| Price | Elements containing $, €, currency patterns | "$49.99" |
| Image | <img> src attributes | product-image.jpg |
| URL | <a> href attributes | /products/shoes |
| Text | Any text node | "In stock" |
You can:
- Rename any column by clicking its label
- Delete columns you don't need
- Add a custom column with your own CSS selector
Pagination Settings
Choose how CrawlPilot collects data beyond the first page:
| Mode | Use when |
|---|---|
| None | Single page only |
| Auto-scroll — Infinite Feed | Twitter, LinkedIn, Instagram-style feeds that load dynamically as you scroll |
| Auto-scroll — Static List | Lists where items appear in index order as you scroll |
| Pagination — Next Button | Classic "Next ›" or "Page 2" navigation |
| Load More Button | A "Show More" or "Load More" button that expands the list in place |
For button-based modes, CrawlPilot asks you to click the button once on the page so it can identify it.
Speed: Controls the delay between scroll cycles or page clicks. Slower speeds are more reliable on heavy, JavaScript-heavy sites.
Max Pages: Maximum number of pages or scroll cycles to process. Set to 0 for unlimited (use with caution on very large sites).
Running the Extraction
Click Start Extraction. The panel shows:
- A live item count updating as data is collected
- A progress bar for paginated extractions
- A Stop button to halt early — data collected so far is saved
After Extraction
When complete:
- Click View Data to open the full data table in a new tab
- Go to History to manage this job, re-run it, or delete it
Handling Duplicate Rows
CrawlPilot automatically deduplicates rows at two levels:
- 02In-memory: During the scroll session, duplicate rows are dropped before they reach storage
- 04Database: A hash-based unique constraint ensures no true duplicates are ever stored
Example: Scraping an E-commerce Product Grid
Goal: Collect title, author, price, and product URL for 200 books from an online bookstore.
- 02Open the bookstore's category page.
- 04Open CrawlPilot → List Extractor.
- 06Pick Container: hover over
<div class="book-grid">(all books highlight together). - 08Pick Item: hover over one
<div class="book-card">. - 10Schema auto-detects: Title (h3), Author (span.author), Price (span.price), URL (a href).
- 12Set pagination to Pagination — Next Button, click the "Next ›" button.
- 14Set Max Pages to 20, Speed to Medium.
- 16Click Start — watch the count climb to 200.
- 18Click View Data → Export CSV.
Selector Editor
For advanced users, every auto-detected selector can be manually edited:
- 02Click the pencil icon next to any column in the schema table.
- 04Enter a CSS selector or XPath expression.
- 06CrawlPilot validates the selector against the current page and shows a match count.
XPath example for a specific attribute:
//div[@class='price']/@data-price