Agent Tools
Built by FirecrawlUsed in Our GTM Stack

Firecrawl CLI

Web scraping, crawling, and structured data extraction — all from the terminal. This is how we power lead enrichment, competitor research, and content extraction across every GTM engagement.

Firecrawl CLI is built by Firecrawl. We use it daily in production. This page documents our frameworks and workflows.

terminal
$ npm install -g firecrawl-cli
6
Core Commands
8+
Output Formats
AI
Agent Mode
MCP
Compatible

Why Firecrawl CLI Is in Our Stack

Without Firecrawl CLI

  • Scraping requires Puppeteer/Playwright setup, proxy management, and custom parsing logic
  • No structured extraction — raw HTML requires manual cleanup before agent consumption
  • Crawling requires custom queue management, deduplication, and rate limiting
  • No search integration — separate tools for web search vs. content extraction
  • Browser automation needs separate infrastructure and session management

With Firecrawl CLI

  • One command scrapes any URL into clean markdown, HTML, JSON, or screenshots
  • --only-main-content strips nav, footers, ads — agent-ready output
  • Built-in crawling with depth, path filters, concurrency, and progress monitoring
  • Web search + scraping in one pipeline — --scrape flag fetches full content from results
  • AI agent mode for natural language extraction with structured JSON schema output

Quick Start

1. Install

Terminal
# Global install
npm install -g firecrawl-cli

# Or one-command setup (installs + authenticates + adds to AI editors)
npx -y firecrawl-cli@latest init --all --browser

2. Authenticate

Terminal
# Option A: browser-based login (recommended)
firecrawl login --browser

# Option B: direct API key
firecrawl login --api-key fc-YOUR-API-KEY

# Option C: environment variable (recommended for agents)
export FIRECRAWL_API_KEY=fc-YOUR-API-KEY

3. Verify

Terminal
firecrawl view-config
# Shows auth method, concurrency limits, remaining credits

firecrawl https://example.com --format markdown --only-main-content
# Returns clean markdown content from any URL

Core Commands

Scrape

Extract content from any URL in markdown, HTML, JSON, links, screenshots, or images. Main-content filtering strips nav, footers, and ads automatically.

Crawl

Full-site crawling from a starting URL. Control depth, path includes/excludes, rate limits, and concurrency. Monitor progress in real time.

Map

Discover all URLs on a website without fetching full content. Filter by search terms, include subdomains, deduplicate by query parameters.

Search

Web search with optional result scraping. Filter by sources (web, news, images), categories (GitHub, research, PDF), location, and time range.

Agent

Natural language data gathering — ask for data in plain English and Firecrawl autonomously browses, scrapes, and returns structured results.

Browser

Cloud-based browser automation with Playwright. Launch sessions, execute commands, take snapshots, click elements — full browser control from terminal.

How We Use It — Agent Workflows

Real workflows from our GTM operations. These are the patterns we run across every client engagement.

Competitor Pricing Intelligence

Agent Prompt

Scrape the pricing pages of our top 3 competitors and extract their plan names, prices, and feature lists into structured JSON.

Agent maps each competitor site to find pricing URLs, scrapes them with main-content filtering, then uses the agent command with a JSON schema to extract structured pricing data — a competitive intel brief built in minutes.

1
firecrawl map https://competitor.com --search "pricing" --limit 10

Discover pricing page URLs

2
firecrawl https://competitor.com/pricing --format markdown --only-main-content

Scrape clean pricing content

3
firecrawl agent "Extract plan names, prices, and features" --schema '{"plans": [{"name": "string", "price": "string", "features": ["string"]}]}'

Structure data with AI extraction

Lead Website Analysis

Agent Prompt

Before my call with Acme Corp, scrape their website and tell me what they do, their tech stack signals, key team members, and any recent news.

Agent scrapes the prospect homepage and about page, extracts key business intelligence, then searches for recent news — a full pre-call brief generated autonomously.

1
firecrawl https://acme.com --format markdown --only-main-content

Scrape homepage for company overview

2
firecrawl map https://acme.com --search "about team leadership" --limit 5

Find team/about pages

3
firecrawl search "Acme Corp news funding" --limit 5 --tbs qdr:m --scrape

Search recent news and scrape results

Content & SEO Audit

Agent Prompt

Crawl our blog, extract all article titles and meta descriptions, and flag any pages without proper headings.

Agent crawls the entire blog subdirectory, extracts content in markdown to parse heading structure, and identifies SEO gaps — an automated content audit.

1
firecrawl crawl https://company.com --include-paths /blog --limit 100 --wait --progress

Crawl all blog pages

2
firecrawl https://company.com/blog/post-1 --format markdown,html --only-main-content

Extract content with heading structure

3
firecrawl agent "Audit heading structure and meta descriptions" --urls https://company.com/blog

AI-powered SEO analysis

Technographics Enrichment

Agent Prompt

For this list of 20 prospect domains, scrape each homepage and identify what CRM, marketing tools, and analytics platforms they use.

Agent scrapes each prospect domain, extracts raw HTML to detect script tags and meta signatures, then structures the tech stack findings — bulk technographic enrichment from the terminal.

1
firecrawl https://prospect.com --format rawHtml,links

Extract raw HTML for tech stack signals

2
firecrawl https://prospect.com --format attributes

Pull meta tags and page attributes

3
firecrawl agent "Identify CRM, analytics, and marketing tools from this website" --urls https://prospect.com

AI-powered tech stack detection

Documentation Extraction

Agent Prompt

Crawl the competitor API docs and extract their endpoint structure, authentication methods, and rate limits into a structured summary.

Agent maps the docs site structure, crawls key pages with depth limits, then extracts structured API intelligence — turning competitor docs into actionable competitive analysis.

1
firecrawl map https://docs.competitor.com --limit 200

Discover all documentation URLs

2
firecrawl crawl https://docs.competitor.com/api --max-depth 2 --limit 50 --wait

Crawl API docs section

3
firecrawl agent "Extract API endpoints, auth methods, and rate limits" --schema '{"endpoints": [{"path": "string", "method": "string", "description": "string"}], "auth": "string", "rateLimits": "string"}'

Structure API intelligence

Real-Time Market Research

Agent Prompt

Search for the latest news about AI agent frameworks, scrape the top 10 results, and summarize the key trends.

Agent performs a web search filtered to recent results, scrapes each article for full content, then extracts a trend summary — real-time market intelligence on demand.

1
firecrawl search "AI agent frameworks 2026" --limit 10 --tbs qdr:w --scrape --scrape-formats markdown

Search and scrape recent articles

2
firecrawl search "AI agent frameworks" --sources news --limit 5

Filter to news sources only

3
firecrawl agent "Summarize the top 3 trends in AI agent frameworks from these articles" --wait

AI-powered trend analysis

Commands Reference

scrape
firecrawl <url>Scrape a single URL (default: markdown output)
firecrawl <url> --format markdown,links,jsonMultiple output formats (comma-separated)
firecrawl <url> --only-main-contentStrip nav, footers, ads — clean content only
firecrawl <url> --screenshotCapture a screenshot of the page
firecrawl <url> --wait-for <ms>Wait for JS rendering before scraping
firecrawl <url> -o output.json --prettySave to file with formatted JSON
search
firecrawl search "<query>" --limit 10Search the web and return results
firecrawl search "<query>" --scrape --scrape-formats markdownSearch and scrape full content from results
firecrawl search "<query>" --sources web,news,imagesFilter by source type
firecrawl search "<query>" --tbs qdr:wTime filter: last hour/day/week/month/year
firecrawl search "<query>" --location "New York" --country USGeo-targeted search
map
firecrawl map <url>Discover all URLs on a website
firecrawl map <url> --search "blog"Filter discovered URLs by search term
firecrawl map <url> --limit 500 --include-subdomainsInclude subdomains in URL discovery
firecrawl map <url> --ignore-query-parametersDeduplicate URLs by removing query params
crawl
firecrawl crawl <url> --wait --progressCrawl with real-time progress monitoring
firecrawl crawl <url> --limit 100 --max-depth 3Limit pages and crawl depth
firecrawl crawl <url> --include-paths /blog,/docsOnly crawl specific path prefixes
firecrawl crawl <url> --exclude-paths /admin,/loginSkip specific path prefixes
firecrawl crawl <url> --delay 500 --max-concurrency 5Rate control for polite crawling
firecrawl crawl <job-id>Check status of a running crawl job
agent
firecrawl agent "<natural language task>" --waitAI-powered autonomous data gathering
firecrawl agent "<task>" --urls <url1>,<url2>Target specific URLs for extraction
firecrawl agent "<task>" --schema '{"key": "type"}'Structured JSON output with schema
firecrawl agent "<task>" --model spark-1-proUse pro model (default: spark-1-mini)
firecrawl agent "<task>" --max-credits 100Set spending limit for the task
browser
firecrawl browser launch-sessionStart a cloud browser session
firecrawl browser execute "open <url>"Navigate to a URL in the session
firecrawl browser execute "snapshot"Take a snapshot of current page state
firecrawl browser execute "click @e5"Click an element by reference ID
firecrawl browser execute --python 'await page.goto("...")'Execute Playwright Python code
firecrawl browser closeEnd the browser session
utility
firecrawl login --browserAuthenticate via browser
firecrawl view-configShow current auth and config status
firecrawl credit-usageView team credit balance and usage
firecrawl logoutClear stored credentials

Output Formats

Every scrape supports multiple output formats via the --format flag. Combine formats with commas for multi-format output.

markdownClean markdown text
htmlProcessed HTML
rawHtmlRaw page HTML
linksAll page links
screenshotPage screenshot
jsonStructured JSON
imagesAll image URLs
summaryAI-generated summary

How We Integrate It

Firecrawl CLI is a core dependency in our GTM infrastructure. Here’s where it fits.

GTM Flywheel Pipeline

Our Firecrawl SDK integration (@mendable/firecrawl-js) powers the scraping layer in every GTM Flywheel report. 5-page multi-page scraping with skipDiscovery: true for fast, targeted extraction.

Technographics Detection

Raw HTML extraction feeds our tech stack detection engine — identifying CRMs, analytics platforms, marketing tools, and social links from script tags, meta tags, and page content.

Competitor Benchmarking

Search + scrape pipelines power our competitive intelligence module — finding competitor content, extracting positioning, and mapping feature gaps automatically.

Agent-Ready Content

The --only-main-content flag and markdown output format produce agent-ready content that feeds directly into Claude for analysis — no cleanup step required.

Want agents that scrape, research, and enrich automatically?

We use Firecrawl CLI as part of our full GTM agent stack. From lead enrichment to competitive intelligence, it’s the scraping backbone behind every engagement.

Need web scraping and enrichment wired into your GTM agents? We deploy and manage the full stack.