v1.0 — Initial Release

Released: June 2026

CrawlPilot v1.0 is the first public release of the extension. It includes the full core toolset for no-code web data extraction.

What's New

List Extractor

  • 3-step visual wizard: pick container → pick item → review schema
  • Auto-detects column types: Title, Price, Image URL, Link, Text
  • CSS Selector and XPath support with inline editor
  • Two auto-scroll strategies:
    • Mutation-Aware (default) — uses MutationObserver to detect new DOM nodes after scroll; best for dynamic feeds like Twitter, LinkedIn
    • Indexed — iterates children by index; best for static paginated lists
  • Pagination support: Next Button, Load More Button
  • Configurable max pages and scroll speed
  • Live item count during extraction
  • In-memory and database-level deduplication
  • Resume capability for interrupted extractions

Page Extractor

  • Bulk multi-URL extraction with configurable concurrency (up to 10 parallel tabs)
  • Per-URL status tracking: Queued → Extracting → Done / Error
  • Automatic tab creation and cleanup
  • Schema definition with Pick on Page for each field
  • Click actions (for cookie banners, expanders, popups)
  • Per-URL error logging with retry capability
  • Unified schema across all processed pages

Metadata Extractor

  • JSON-LD structured data parsing
  • Open Graph tag extraction (og:title, og:image, og:description, og:url, og:type)
  • Twitter Card extraction
  • Standard meta tag extraction (title, description, author, canonical, robots)
  • HTML content extraction converted to Markdown via Turndown
  • Table extraction as structured data
  • Link enumeration (href + anchor text)
  • Configurable request delay, timeout, and concurrency

Text Summarizer

  • Extracts main content from <article>, <main>, or body fallback
  • AI-powered summarization via Anthropic Claude API
  • Copy raw text or summary independently
  • Trial usage quota built in

Image Downloader

  • Detects: <img> tags (including lazy-loaded), <picture>/<source>, CSS background images, video poster frames, canvas (converted to PNG), SVG
  • Dimension filtering: Small / Medium / Large
  • Individual or bulk selection
  • ZIP export via JSZip
  • CORS-aware fetching via background extension proxy
  • Supported formats: JPG, PNG, GIF, SVG, WebP, AVIF

Browser Utilities

  • Right-Click Unlocker: Removes JavaScript-based right-click blocking on any page

History

  • Full job history with status indicators
  • Re-run jobs with original configuration
  • Delete jobs (removes job record and all rows)
  • Live progress indicator for running jobs

Data Table

  • Full-screen grid view with sortable and filterable columns
  • Inline cell editing
  • Row deletion
  • Dataset merge (combine rows from multiple extraction jobs)
  • CSV export

Settings

  • Language selector: English, Spanish, French, German, Italian, Chinese (Simplified), Japanese, Russian, Portuguese
  • Storage usage display
  • Data retention policy (auto-delete data older than N days)
  • Manual clear old data and full database wipe
  • Integrations UI: Webhook, Airtable, Google Sheets (configuration saved; outbound sending in v1.1)

Infrastructure

  • Local-first: all data in IndexedDB, nothing uploaded to servers
  • Manifest V3 Chrome extension
  • Background service worker with keep-alive alarms for long-running jobs
  • State persistence across service worker restarts
  • Multi-language UI via i18next (9 languages)

Known Limitations

  • Scheduled runs: Automated recurring extractions are not yet supported. Use History → Re-run manually.
  • Integration outbound sending: Webhook, Airtable, and Google Sheets data transmission is not yet active. Credential input is saved for v1.1.
  • Email Extractor: Marked "Coming Soon" — not available in v1.0.
  • Safari / Firefox: Not supported. Chrome 114+ only.
  • Cross-origin iframes: Content inside sandboxed iframes on different domains cannot be extracted.
  • JavaScript SPAs: Page Extractor works best on server-rendered or static pages. Complex SPAs that require multi-step interaction beyond simple clicks may produce incomplete results.

Upgrade Notes

v1.0 is the first release — no migration from a previous version is required.