List Extractor

The List Extractor pulls structured data from pages that display repeated items — product grids, article feeds, job listings, directory tables, and anything with a consistent repeating pattern.

It supports single-page extraction as well as multi-page collection via infinite scroll, "Load More" buttons, and classic next-page pagination.

When to Use It

A page shows 50 products and you want all their names, prices, and URLs
A blog lists 200 articles with titles and dates across 10 pages
A job board shows listings with infinite scroll
A directory table has rows you want as spreadsheet data

The 3-Step Wizard

Step 1 — Pick the Container

The container is the parent element that wraps all your list items.

02
Click Pick Container.
04
Hover over the page — elements highlight as you move.
06
Find the element that surrounds all items (the grid wrapper, the <ul>, the <div class="results">). When you see all items grouped with a green outline, you're at the right level.
08
Click to confirm.

CrawlPilot shows the detected CSS selector and the number of items found on the current page.

[!TIP] Use the ↑ Up and ↓ Down arrows in the picker to navigate up or down the DOM tree. Siblings highlight green to confirm how many items are captured at each level.

Step 2 — Pick One Item

02
Click Pick Item.
04
Hover over a single repeating unit (one product card, one article row, one table row).
06
Click to confirm.

CrawlPilot detects all sibling items matching the same pattern and shows the count.

Step 3 — Review Schema

CrawlPilot analyzes the selected item and auto-generates columns for the fields it detects. Common auto-detected types:

Type	Detected from	Example value
Title	Heading elements, bold text	"Running Shoes v2"
Price	Elements containing `$`, `€`, currency patterns	"$49.99"
Image	`<img>` src attributes	product-image.jpg
URL	`<a>` href attributes	/products/shoes
Text	Any text node	"In stock"

You can:

Rename any column by clicking its label
Delete columns you don't need
Add a custom column with your own CSS selector

Pagination Settings

Choose how CrawlPilot collects data beyond the first page:

Mode	Use when
None	Single page only
Auto-scroll — Infinite Feed	Twitter, LinkedIn, Instagram-style feeds that load dynamically as you scroll
Auto-scroll — Static List	Lists where items appear in index order as you scroll
Pagination — Next Button	Classic "Next ›" or "Page 2" navigation
Load More Button	A "Show More" or "Load More" button that expands the list in place

For button-based modes, CrawlPilot asks you to click the button once on the page so it can identify it.

Speed: Controls the delay between scroll cycles or page clicks. Slower speeds are more reliable on heavy, JavaScript-heavy sites.

Max Pages: Maximum number of pages or scroll cycles to process. Set to 0 for unlimited (use with caution on very large sites).

Running the Extraction

Click Start Extraction. The panel shows:

A live item count updating as data is collected
A progress bar for paginated extractions
A Stop button to halt early — data collected so far is saved

After Extraction

When complete:

Click View Data to open the full data table in a new tab
Go to History to manage this job, re-run it, or delete it

Handling Duplicate Rows

CrawlPilot automatically deduplicates rows at two levels:

02
In-memory: During the scroll session, duplicate rows are dropped before they reach storage
04
Database: A hash-based unique constraint ensures no true duplicates are ever stored

Example: Scraping an E-commerce Product Grid

Goal: Collect title, author, price, and product URL for 200 books from an online bookstore.

02
Open the bookstore's category page.
04
Open CrawlPilot → List Extractor.
06
Pick Container: hover over <div class="book-grid"> (all books highlight together).
08
Pick Item: hover over one <div class="book-card">.
10
Schema auto-detects: Title (h3), Author (span.author), Price (span.price), URL (a href).
12
Set pagination to Pagination — Next Button, click the "Next ›" button.
14
Set Max Pages to 20, Speed to Medium.
16
Click Start — watch the count climb to 200.
18
Click View Data → Export CSV.

Selector Editor

For advanced users, every auto-detected selector can be manually edited:

02
Click the pencil icon next to any column in the schema table.
04
Enter a CSS selector or XPath expression.
06
CrawlPilot validates the selector against the current page and shows a match count.

XPath example for a specific attribute:

//div[@class='price']/@data-price