Firecrawl CLI
Web scraping, crawling, and structured data extraction — all from the terminal. This is how we power lead enrichment, competitor research, and content extraction across every GTM engagement.
Firecrawl CLI is built by Firecrawl. We use it daily in production. This page documents our frameworks and workflows.
$ npm install -g firecrawl-cliWhy Firecrawl CLI Is in Our Stack
Without Firecrawl CLI
- Scraping requires Puppeteer/Playwright setup, proxy management, and custom parsing logic
- No structured extraction — raw HTML requires manual cleanup before agent consumption
- Crawling requires custom queue management, deduplication, and rate limiting
- No search integration — separate tools for web search vs. content extraction
- Browser automation needs separate infrastructure and session management
With Firecrawl CLI
- One command scrapes any URL into clean markdown, HTML, JSON, or screenshots
--only-main-contentstrips nav, footers, ads — agent-ready output- Built-in crawling with depth, path filters, concurrency, and progress monitoring
- Web search + scraping in one pipeline —
--scrapeflag fetches full content from results - AI agent mode for natural language extraction with structured JSON schema output
Quick Start
1. Install
# Global install
npm install -g firecrawl-cli
# Or one-command setup (installs + authenticates + adds to AI editors)
npx -y firecrawl-cli@latest init --all --browser2. Authenticate
# Option A: browser-based login (recommended)
firecrawl login --browser
# Option B: direct API key
firecrawl login --api-key fc-YOUR-API-KEY
# Option C: environment variable (recommended for agents)
export FIRECRAWL_API_KEY=fc-YOUR-API-KEY3. Verify
firecrawl view-config
# Shows auth method, concurrency limits, remaining credits
firecrawl https://example.com --format markdown --only-main-content
# Returns clean markdown content from any URLCore Commands
Scrape
Extract content from any URL in markdown, HTML, JSON, links, screenshots, or images. Main-content filtering strips nav, footers, and ads automatically.
Crawl
Full-site crawling from a starting URL. Control depth, path includes/excludes, rate limits, and concurrency. Monitor progress in real time.
Map
Discover all URLs on a website without fetching full content. Filter by search terms, include subdomains, deduplicate by query parameters.
Search
Web search with optional result scraping. Filter by sources (web, news, images), categories (GitHub, research, PDF), location, and time range.
Agent
Natural language data gathering — ask for data in plain English and Firecrawl autonomously browses, scrapes, and returns structured results.
Browser
Cloud-based browser automation with Playwright. Launch sessions, execute commands, take snapshots, click elements — full browser control from terminal.
How We Use It — Agent Workflows
Real workflows from our GTM operations. These are the patterns we run across every client engagement.
Competitor Pricing Intelligence
Agent Prompt
“Scrape the pricing pages of our top 3 competitors and extract their plan names, prices, and feature lists into structured JSON.”
Agent maps each competitor site to find pricing URLs, scrapes them with main-content filtering, then uses the agent command with a JSON schema to extract structured pricing data — a competitive intel brief built in minutes.
firecrawl map https://competitor.com --search "pricing" --limit 10Discover pricing page URLs
firecrawl https://competitor.com/pricing --format markdown --only-main-contentScrape clean pricing content
firecrawl agent "Extract plan names, prices, and features" --schema '{"plans": [{"name": "string", "price": "string", "features": ["string"]}]}'Structure data with AI extraction
Lead Website Analysis
Agent Prompt
“Before my call with Acme Corp, scrape their website and tell me what they do, their tech stack signals, key team members, and any recent news.”
Agent scrapes the prospect homepage and about page, extracts key business intelligence, then searches for recent news — a full pre-call brief generated autonomously.
firecrawl https://acme.com --format markdown --only-main-contentScrape homepage for company overview
firecrawl map https://acme.com --search "about team leadership" --limit 5Find team/about pages
firecrawl search "Acme Corp news funding" --limit 5 --tbs qdr:m --scrapeSearch recent news and scrape results
Content & SEO Audit
Agent Prompt
“Crawl our blog, extract all article titles and meta descriptions, and flag any pages without proper headings.”
Agent crawls the entire blog subdirectory, extracts content in markdown to parse heading structure, and identifies SEO gaps — an automated content audit.
firecrawl crawl https://company.com --include-paths /blog --limit 100 --wait --progressCrawl all blog pages
firecrawl https://company.com/blog/post-1 --format markdown,html --only-main-contentExtract content with heading structure
firecrawl agent "Audit heading structure and meta descriptions" --urls https://company.com/blogAI-powered SEO analysis
Technographics Enrichment
Agent Prompt
“For this list of 20 prospect domains, scrape each homepage and identify what CRM, marketing tools, and analytics platforms they use.”
Agent scrapes each prospect domain, extracts raw HTML to detect script tags and meta signatures, then structures the tech stack findings — bulk technographic enrichment from the terminal.
firecrawl https://prospect.com --format rawHtml,linksExtract raw HTML for tech stack signals
firecrawl https://prospect.com --format attributesPull meta tags and page attributes
firecrawl agent "Identify CRM, analytics, and marketing tools from this website" --urls https://prospect.comAI-powered tech stack detection
Documentation Extraction
Agent Prompt
“Crawl the competitor API docs and extract their endpoint structure, authentication methods, and rate limits into a structured summary.”
Agent maps the docs site structure, crawls key pages with depth limits, then extracts structured API intelligence — turning competitor docs into actionable competitive analysis.
firecrawl map https://docs.competitor.com --limit 200Discover all documentation URLs
firecrawl crawl https://docs.competitor.com/api --max-depth 2 --limit 50 --waitCrawl API docs section
firecrawl agent "Extract API endpoints, auth methods, and rate limits" --schema '{"endpoints": [{"path": "string", "method": "string", "description": "string"}], "auth": "string", "rateLimits": "string"}'Structure API intelligence
Real-Time Market Research
Agent Prompt
“Search for the latest news about AI agent frameworks, scrape the top 10 results, and summarize the key trends.”
Agent performs a web search filtered to recent results, scrapes each article for full content, then extracts a trend summary — real-time market intelligence on demand.
firecrawl search "AI agent frameworks 2026" --limit 10 --tbs qdr:w --scrape --scrape-formats markdownSearch and scrape recent articles
firecrawl search "AI agent frameworks" --sources news --limit 5Filter to news sources only
firecrawl agent "Summarize the top 3 trends in AI agent frameworks from these articles" --waitAI-powered trend analysis
Commands Reference
firecrawl <url>Scrape a single URL (default: markdown output)firecrawl <url> --format markdown,links,jsonMultiple output formats (comma-separated)firecrawl <url> --only-main-contentStrip nav, footers, ads — clean content onlyfirecrawl <url> --screenshotCapture a screenshot of the pagefirecrawl <url> --wait-for <ms>Wait for JS rendering before scrapingfirecrawl <url> -o output.json --prettySave to file with formatted JSONfirecrawl search "<query>" --limit 10Search the web and return resultsfirecrawl search "<query>" --scrape --scrape-formats markdownSearch and scrape full content from resultsfirecrawl search "<query>" --sources web,news,imagesFilter by source typefirecrawl search "<query>" --tbs qdr:wTime filter: last hour/day/week/month/yearfirecrawl search "<query>" --location "New York" --country USGeo-targeted searchfirecrawl map <url>Discover all URLs on a websitefirecrawl map <url> --search "blog"Filter discovered URLs by search termfirecrawl map <url> --limit 500 --include-subdomainsInclude subdomains in URL discoveryfirecrawl map <url> --ignore-query-parametersDeduplicate URLs by removing query paramsfirecrawl crawl <url> --wait --progressCrawl with real-time progress monitoringfirecrawl crawl <url> --limit 100 --max-depth 3Limit pages and crawl depthfirecrawl crawl <url> --include-paths /blog,/docsOnly crawl specific path prefixesfirecrawl crawl <url> --exclude-paths /admin,/loginSkip specific path prefixesfirecrawl crawl <url> --delay 500 --max-concurrency 5Rate control for polite crawlingfirecrawl crawl <job-id>Check status of a running crawl jobfirecrawl agent "<natural language task>" --waitAI-powered autonomous data gatheringfirecrawl agent "<task>" --urls <url1>,<url2>Target specific URLs for extractionfirecrawl agent "<task>" --schema '{"key": "type"}'Structured JSON output with schemafirecrawl agent "<task>" --model spark-1-proUse pro model (default: spark-1-mini)firecrawl agent "<task>" --max-credits 100Set spending limit for the taskfirecrawl browser launch-sessionStart a cloud browser sessionfirecrawl browser execute "open <url>"Navigate to a URL in the sessionfirecrawl browser execute "snapshot"Take a snapshot of current page statefirecrawl browser execute "click @e5"Click an element by reference IDfirecrawl browser execute --python 'await page.goto("...")'Execute Playwright Python codefirecrawl browser closeEnd the browser sessionfirecrawl login --browserAuthenticate via browserfirecrawl view-configShow current auth and config statusfirecrawl credit-usageView team credit balance and usagefirecrawl logoutClear stored credentialsOutput Formats
Every scrape supports multiple output formats via the --format flag. Combine formats with commas for multi-format output.
markdownClean markdown texthtmlProcessed HTMLrawHtmlRaw page HTMLlinksAll page linksscreenshotPage screenshotjsonStructured JSONimagesAll image URLssummaryAI-generated summaryHow We Integrate It
Firecrawl CLI is a core dependency in our GTM infrastructure. Here’s where it fits.
GTM Flywheel Pipeline
Our Firecrawl SDK integration (@mendable/firecrawl-js) powers the scraping layer in every GTM Flywheel report. 5-page multi-page scraping with skipDiscovery: true for fast, targeted extraction.
Technographics Detection
Raw HTML extraction feeds our tech stack detection engine — identifying CRMs, analytics platforms, marketing tools, and social links from script tags, meta tags, and page content.
Competitor Benchmarking
Search + scrape pipelines power our competitive intelligence module — finding competitor content, extracting positioning, and mapping feature gaps automatically.
Agent-Ready Content
The --only-main-content flag and markdown output format produce agent-ready content that feeds directly into Claude for analysis — no cleanup step required.
Want agents that scrape, research, and enrich automatically?
We use Firecrawl CLI as part of our full GTM agent stack. From lead enrichment to competitive intelligence, it’s the scraping backbone behind every engagement.
Need web scraping and enrichment wired into your GTM agents? We deploy and manage the full stack.