v1.0 — Initial Release
Released: June 2026
CrawlPilot v1.0 is the first public release of the extension. It includes the full core toolset for no-code web data extraction.
What's New
List Extractor
- 3-step visual wizard: pick container → pick item → review schema
- Auto-detects column types: Title, Price, Image URL, Link, Text
- CSS Selector and XPath support with inline editor
- Two auto-scroll strategies:
- Mutation-Aware (default) — uses MutationObserver to detect new DOM nodes after scroll; best for dynamic feeds like Twitter, LinkedIn
- Indexed — iterates children by index; best for static paginated lists
- Pagination support: Next Button, Load More Button
- Configurable max pages and scroll speed
- Live item count during extraction
- In-memory and database-level deduplication
- Resume capability for interrupted extractions
Page Extractor
- Bulk multi-URL extraction with configurable concurrency (up to 10 parallel tabs)
- Per-URL status tracking: Queued → Extracting → Done / Error
- Automatic tab creation and cleanup
- Schema definition with Pick on Page for each field
- Click actions (for cookie banners, expanders, popups)
- Per-URL error logging with retry capability
- Unified schema across all processed pages
Metadata Extractor
- JSON-LD structured data parsing
- Open Graph tag extraction (
og:title,og:image,og:description,og:url,og:type) - Twitter Card extraction
- Standard meta tag extraction (title, description, author, canonical, robots)
- HTML content extraction converted to Markdown via Turndown
- Table extraction as structured data
- Link enumeration (href + anchor text)
- Configurable request delay, timeout, and concurrency
Text Summarizer
- Extracts main content from
<article>,<main>, or body fallback - AI-powered summarization via Anthropic Claude API
- Copy raw text or summary independently
- Trial usage quota built in
Image Downloader
- Detects:
<img>tags (including lazy-loaded),<picture>/<source>, CSS background images, video poster frames, canvas (converted to PNG), SVG - Dimension filtering: Small / Medium / Large
- Individual or bulk selection
- ZIP export via JSZip
- CORS-aware fetching via background extension proxy
- Supported formats: JPG, PNG, GIF, SVG, WebP, AVIF
Browser Utilities
- Right-Click Unlocker: Removes JavaScript-based right-click blocking on any page
History
- Full job history with status indicators
- Re-run jobs with original configuration
- Delete jobs (removes job record and all rows)
- Live progress indicator for running jobs
Data Table
- Full-screen grid view with sortable and filterable columns
- Inline cell editing
- Row deletion
- Dataset merge (combine rows from multiple extraction jobs)
- CSV export
Settings
- Language selector: English, Spanish, French, German, Italian, Chinese (Simplified), Japanese, Russian, Portuguese
- Storage usage display
- Data retention policy (auto-delete data older than N days)
- Manual clear old data and full database wipe
- Integrations UI: Webhook, Airtable, Google Sheets (configuration saved; outbound sending in v1.1)
Infrastructure
- Local-first: all data in IndexedDB, nothing uploaded to servers
- Manifest V3 Chrome extension
- Background service worker with keep-alive alarms for long-running jobs
- State persistence across service worker restarts
- Multi-language UI via i18next (9 languages)
Known Limitations
- Scheduled runs: Automated recurring extractions are not yet supported. Use History → Re-run manually.
- Integration outbound sending: Webhook, Airtable, and Google Sheets data transmission is not yet active. Credential input is saved for v1.1.
- Email Extractor: Marked "Coming Soon" — not available in v1.0.
- Safari / Firefox: Not supported. Chrome 114+ only.
- Cross-origin iframes: Content inside sandboxed iframes on different domains cannot be extracted.
- JavaScript SPAs: Page Extractor works best on server-rendered or static pages. Complex SPAs that require multi-step interaction beyond simple clicks may produce incomplete results.
Upgrade Notes
v1.0 is the first release — no migration from a previous version is required.