# Reproduce This: GEO + ARM 100-Site Audit — 2026-04-16 This study measured 100 websites across 29 industries on two independent axes: GEO (infrastructure readiness for AI crawlers and answer engines, scored 0–100 on a 9-dimension rubric) and ARM Probe (actual citation rate across four live AI answer-engine APIs, scored 0–100). Both scores are derived from deterministic, reproducible signals. Anyone with the listed API keys and a shell can re-run Phase 1 and Phase 2 against any site in the cohort and compare dimension-by-dimension. Phase 3 (Probe ARM) requires live API access to all four platforms and will produce scores that drift over time as AI systems update their training and retrieval indexes — the scores here reflect site and AI state as of 2026-04-16. --- ## What You Need ### API Keys | Key Name | Purpose | Where to Get It | |---|---|---| | `SERPER_API_KEY` | Brand SERP lookups (Knowledge Graph, sitelinks, related searches, third-party validation, Wikipedia/gov/edu queries) | serper.dev | | `OPENAI_API_KEY` | OpenAI web-search probes (`gpt-4o-mini-search-preview`) | platform.openai.com | | `ANTHROPIC_API_KEY` | Anthropic web-search probes (`claude-haiku-4-5`) | console.anthropic.com | | `PERPLEXITY_API_KEY` | Perplexity probes (`sonar`) | docs.perplexity.ai | | `GEMINI_API_KEY` | Gemini grounding probes (`gemini-2.5-flash`) | aistudio.google.com | ### Estimated Cost (per site, pay-as-you-go rates) | Phase | Cost per site | |---|---| | Phase 1 — 8-Pillar scan (HTTP probes, no metered API calls) | ~$0.00 | | Phase 2 — GEO scoring (Serper SERP queries, ~5 queries/site) | ~$0.005 | | Phase 3 — ARM Probe (20 queries × 4 platforms) | ~$1.40–$1.72 | | **100-site total (pay-as-you-go)** | **~$145–$175** | OpenAI `gpt-4o-mini-search-preview` ≈ $0.03/call; Gemini 2.5 Flash w/ grounding ≈ $0.04/call; Anthropic `claude-haiku-4-5` ≈ $0.01/call; Perplexity sonar ≈ $0.006/call. Serper ≈ $0.001/query. --- ## Phase 1 — 8-Pillar Scan Run these against any site. Replace `{site}` with the bare domain (e.g., `example.com`). Use `curl -L` to follow redirects unless the signal specifically tests redirect behavior. --- ### S1 — MCP Server ```bash curl -sI "https://{site}/.well-known/mcp.json" ``` **Pass rule:** HTTP 200 AND `Content-Type` contains `application/json`. --- ### S2 — llms.txt ```bash curl -sS "https://{site}/llms.txt" | head -c 200 ``` **Pass rule:** HTTP 200 AND response body begins with `#` (Markdown, not an HTML error page). Record byte size and first 60 characters. --- ### S3 — Clean-Room HTML (AI-accessible content) ```bash # Browser UA curl -sS -A "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36" \ "https://{site}/" -o /tmp/browser_body.html # AI crawler UA curl -sS -A "GPTBot/1.0" \ "https://{site}/" -o /tmp/bot_body.html wc -c /tmp/browser_body.html /tmp/bot_body.html ``` **Pass rule:** `abs(len_browser - len_bot) / max(len_browser, len_bot) < 0.05`. Less than 5% content difference between browser and AI crawler UA. Larger divergence indicates server-side UA detection that hides content from bots. --- ### S4 — AI Content Feed Try each path in order; stop at the first 200: ```bash for path in "/.well-known/ai-content-index.json" "/ai-content-index.json" "/for-ai.txt"; do STATUS=$(curl -o /dev/null -sS -w "%{http_code}" "https://{site}${path}") echo "${path}: ${STATUS}" done ``` **Pass rule:** At least one path returns HTTP 200 with non-empty body and a content-type of `application/json` or `text/plain`. A 301 chain that ends in a 404 is a fail. --- ### S5 — JSON-LD Structured Data ```bash curl -sS -A "Mozilla/5.0" "https://{site}/" \ | grep -o '