tags: [mcp, crawler, scraper, firecrawl, apify, smithery, runbook]
Web crawler MCPs (3-tier fallback)
Web crawler MCPs (3-tier fallback)
Why
URL → crawl N pages → markdown for AI agent ingestion. The three tiers degrade gracefully: best quality first, then a service fallback, then a no-key local crawler that always works.
Per the dual-bucket MCP rule: keyed services land in Smithery (token never enters the public repo); the no-key local crawler lands in .mcp.json (safe to commit).
Tier 1: firecrawl (Smithery, KEYED — best quality)
JS-rendered scrape + structured crawl with markdown output. Highest quality for SPAs and gated pages.
-
Signup: https://firecrawl.dev/signup — free 500 credits/mo, no card.
-
Copy
FIRECRAWL_API_KEYfrom the dashboard. -
Install via Smithery (paste the key on prompt):
npx -y @smithery/cli@latest install @mendableai/firecrawl-mcp-server --client claude
Tools exposed: firecrawl_scrape, firecrawl_crawl, firecrawl_map, firecrawl_search.
Tier 2: apify (Smithery, KEYED — fallback)
Massive actor catalog (Web Scraper, Cheerio Scraper, Puppeteer, etc.) reachable through one MCP. Use when firecrawl is over quota or a site needs a specific actor (e.g. Instagram, Twitter, Maps).
-
Signup: https://console.apify.com/sign-up — $5 free credit/mo, no card.
-
Copy
APIFY_TOKENfrom Settings → Integrations. -
Install via Smithery (paste the token on prompt):
npx -y @smithery/cli@latest install apify/actors-mcp-server --client claude
Tools exposed: dynamic — depends on which Apify actors you authorise in the Smithery profile.
Tier 3: crawl-md (in-repo, NO KEY — last resort)
Local Node.js MCP using fetch + turndown. No external service, no key, no quota. Slower (no JS rendering, single-threaded BFS), but always works and never costs anything.
- Source:
scripts/crawl-mcp/server.js - Auto-loaded via
.mcp.json→ no install step needed once the repo is cloned. - Tool:
crawl(url, max_pages=50, same_origin=true)→ markdown of all crawled pages. - Hard cap: 200 pages per call. HTML only (skips PDF/images/binaries).
Fallback decision flow
- Try
firecrawl_crawl— best quality, handles JS. - If 402/quota or auth-walled site fails → try
apifyactor (e.g.cheerio-scraperfor static HTML,puppeteer-scraperfor JS). - If both keyed services are out of credits → fall back to
crawl-mdfor static HTML.
Related
keyed-mcp-via-smithery-2026-06-27— Smithery install pattern, key rotation.- Decision: dual-bucket MCP rule (no-key in
.mcp.json, keyed in Smithery).