# Reproduce This: GEO + ARM 100-Site Audit — 2026-04-16

This study measured 100 websites across 29 industries on two independent axes: GEO (infrastructure readiness for AI crawlers and answer engines, scored 0–100 on a 9-dimension rubric) and ARM Probe (actual citation rate across four live AI answer-engine APIs, scored 0–100). Both scores are derived from deterministic, reproducible signals. Anyone with the listed API keys and a shell can re-run Phase 1 and Phase 2 against any site in the cohort and compare dimension-by-dimension. Phase 3 (Probe ARM) requires live API access to all four platforms and will produce scores that drift over time as AI systems update their training and retrieval indexes — the scores here reflect site and AI state as of 2026-04-16.

---

## What You Need

### API Keys

| Key Name | Purpose | Where to Get It |
|---|---|---|
| `SERPER_API_KEY` | Brand SERP lookups (Knowledge Graph, sitelinks, related searches, third-party validation, Wikipedia/gov/edu queries) | serper.dev |
| `OPENAI_API_KEY` | OpenAI web-search probes (`gpt-4o-mini-search-preview`) | platform.openai.com |
| `ANTHROPIC_API_KEY` | Anthropic web-search probes (`claude-haiku-4-5`) | console.anthropic.com |
| `PERPLEXITY_API_KEY` | Perplexity probes (`sonar`) | docs.perplexity.ai |
| `GEMINI_API_KEY` | Gemini grounding probes (`gemini-2.5-flash`) | aistudio.google.com |

### Estimated Cost (per site, pay-as-you-go rates)

| Phase | Cost per site |
|---|---|
| Phase 1 — 8-Pillar scan (HTTP probes, no metered API calls) | ~$0.00 |
| Phase 2 — GEO scoring (Serper SERP queries, ~5 queries/site) | ~$0.005 |
| Phase 3 — ARM Probe (20 queries × 4 platforms) | ~$1.40–$1.72 |
| **100-site total (pay-as-you-go)** | **~$145–$175** |

OpenAI `gpt-4o-mini-search-preview` ≈ $0.03/call; Gemini 2.5 Flash w/ grounding ≈ $0.04/call; Anthropic `claude-haiku-4-5` ≈ $0.01/call; Perplexity sonar ≈ $0.006/call. Serper ≈ $0.001/query.

---

## Phase 1 — 8-Pillar Scan

Run these against any site. Replace `{site}` with the bare domain (e.g., `example.com`). Use `curl -L` to follow redirects unless the signal specifically tests redirect behavior.

---

### S1 — MCP Server

```bash
curl -sI "https://{site}/.well-known/mcp.json"
```

**Pass rule:** HTTP 200 AND `Content-Type` contains `application/json`.

---

### S2 — llms.txt

```bash
curl -sS "https://{site}/llms.txt" | head -c 200
```

**Pass rule:** HTTP 200 AND response body begins with `#` (Markdown, not an HTML error page). Record byte size and first 60 characters.

---

### S3 — Clean-Room HTML (AI-accessible content)

```bash
# Browser UA
curl -sS -A "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36" \
  "https://{site}/" -o /tmp/browser_body.html

# AI crawler UA
curl -sS -A "GPTBot/1.0" \
  "https://{site}/" -o /tmp/bot_body.html

wc -c /tmp/browser_body.html /tmp/bot_body.html
```

**Pass rule:** `abs(len_browser - len_bot) / max(len_browser, len_bot) < 0.05`. Less than 5% content difference between browser and AI crawler UA. Larger divergence indicates server-side UA detection that hides content from bots.

---

### S4 — AI Content Feed

Try each path in order; stop at the first 200:

```bash
for path in "/.well-known/ai-content-index.json" "/ai-content-index.json" "/for-ai.txt"; do
  STATUS=$(curl -o /dev/null -sS -w "%{http_code}" "https://{site}${path}")
  echo "${path}: ${STATUS}"
done
```

**Pass rule:** At least one path returns HTTP 200 with non-empty body and a content-type of `application/json` or `text/plain`. A 301 chain that ends in a 404 is a fail.

---

### S5 — JSON-LD Structured Data

```bash
curl -sS -A "Mozilla/5.0" "https://{site}/" \
  | grep -o '<script type="application/ld+json">' | wc -l
```

**Pass rule:** Count ≥ 1. Record the count and distinct `@type` values found across all blocks.

---

### S6 — TTFB (3-hit warm-cache protocol)

```bash
TARGET="https://{site}/"
for i in 1 2 3; do
  curl -o /dev/null -sS -w \
    "hit${i}: time_starttransfer=%{time_starttransfer}s time_connect=%{time_connect}s\n" \
    "$TARGET"
done
```

**Pass rule:**
- Discard hit 1 entirely (CDN cold start / priming). Using hit 1 inflates TTFB by 50–200ms and produces false fails on CDN-fronted sites.
- Score on hit 3.
- `compensatedMs = (time_starttransfer_hit3 × 1000) − (time_connect_hit3 × 1000)`
- Pass = `compensatedMs < 200`

This per-request connect-time subtraction isolates server-side response latency from your geographic distance to the nearest PoP. No external calibration site is needed.

---

### S7 — AI Bot Access (robots.txt)

```bash
curl -sS "https://{site}/robots.txt"
```

**Pass rule:** Count how many of these bot names appear in an explicit `User-agent:` rule with `Allow:` or with no `Disallow:` for that agent:

```
GPTBot, ClaudeBot, ChatGPT-User, Anthropic-AI, Google-Extended, PerplexityBot,
Applebot, CCBot, Bytespider, Cohere-AI, Meta-ExternalAgent, AmazonBot, DeepSeekBot,
OAI-SearchBot, Amazonbot, facebookexternalhit, LinkedInBot
```

Pass = `ai_bots_allowed_count >= 10`.

A wildcard `User-agent: *` with no Disallow is NOT counted as an explicit allowlist entry for any specific bot. Each bot name must appear explicitly.

---

### S8 — HTTP/3

```bash
curl -sI "https://{site}/" | grep -i "alt-svc"
```

**Pass rule:** `alt-svc` header is present and contains `h3`. Example passing value: `alt-svc: h3=":443"; ma=86400`.

---

## Phase 2 — GEO Scoring (9 Dimensions)

GEO is a 100-point weighted composite. Scoring agents in this audit received only `scoring-prompt.md` and one pillar receipt JSON per site — no other project context was loaded. The scoring rubric is deterministic; any agent (or human) with the pillar receipt and the table below should produce the same score.

| Dimension | Max | Derivation |
|---|---|---|
| AI Bot Access | 15 | `ai_bots_allowed_count` from S7: ≥10 = 15, 5–9 = 10, 1–4 = 5, 0 = 0 |
| Structured Data | 12 | Distinct JSON-LD `@type` values on homepage: ≥5 = 12, 3–4 = 9, 1–2 = 6, 0 = 0 |
| AI-Facing Files | 10 | `round((S1_pass + S2_pass + S4_pass) / 3 × 10, 1)` |
| Sitemap | 8 | Valid `sitemap.xml` (HTTP 200 with `<urlset>` or `<sitemapindex>`) AND `Sitemap:` directive in robots.txt = 8; either one alone = 5; neither = 0 |
| Content Density | 15 | Homepage body word count (strip `<script>`, `<style>`, all HTML tags, entity-decode, split on whitespace): ≥3000 = 15, 1500–2999 = 12, 500–1499 = 8, 100–499 = 4, <100 = 0 |
| Citation Data | 12 | Sum of citation-like fields across all JSON-LD blocks: `sameAs` array entries + `citation` + `isBasedOn` + `author.url` + `mainEntityOfPage` + `subjectOf`. Totals: ≥10 = 12, 5–9 = 9, 1–4 = 6, 0 = 0 |
| Tech Perf | 5 | S6 pass = +2.5; S8 pass = +2.5 |
| Freshness | 8 | Fetch sitemap, sample up to 100 `<lastmod>` values, count % within 90 days of 2026-04-16: ≥80% = 8, 50–79% = 6, 20–49% = 4, 1–19% = 2, 0% or no lastmod = 0 |
| Authority | 15 | `brand_authority_40 / 40 × 15` (see SERP brand authority derivation below) |

**geo_score = sum of all 9 dimensions (max 100, target 85)**

### Brand Authority sub-score (max 40, feeds Authority dimension)

Run these Serper API queries. Replace `{brand_token}` with the site's brand name (e.g., `"NASA"`, `"AppFolio"`).

```bash
# Brand SERP
curl -sS -X POST "https://google.serper.dev/search" \
  -H "X-API-KEY: <YOUR_SERPER_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{"q":"{brand_token}","num":10}'

# Wikipedia check
curl -sS -X POST "https://google.serper.dev/search" \
  -H "X-API-KEY: <YOUR_SERPER_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{"q":"\"{brand_token}\" site:wikipedia.org","num":5}'

# Gov/edu check
curl -sS -X POST "https://google.serper.dev/search" \
  -H "X-API-KEY: <YOUR_SERPER_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{"q":"\"{brand_token}\" (site:.gov OR site:.edu)","num":10}'
```

**Domain Authority proxy (max 20):** Wikipedia brand entry found = +10; gov/edu query returns ≥3 results = +10.

**Domain Age (max 10):** Look up RDAP registration date at `rdap.org/domain/{domain}`. CDX (`web.archive.org/cdx/search/cdx?url={domain}&limit=1&output=json`) as fallback if RDAP returns 404. Age bands: ≥20 years = 10, 10–19 = 7, 5–9 = 5, 2–4 = 3, <2 = 1.

**Brand Strength (max 10):** Top-10 organic results for brand query: ≥5 results mention brand in title/snippet AND `peopleAlsoAsk` count ≥3 = 10; ≥3 results mention brand = 5; else 0.

`brand_authority_40 = DA_proxy + domain_age + brand_strength`

---

## Phase 3 — ARM Probe

The Probe ARM score measures whether AI answer engines actually cite a site when answering real user queries. It does not measure technical readiness (that is GEO) — it measures outcome.

**Math:** 20 queries × 4 platforms per site = 80 API calls per site. 100 sites = 8,000 total calls.

**Pass rule (per probe):** The site's domain (case-insensitive) appears in any of:
- Any portion of the response text
- Any citation URL or inline link
- Any grounding chunk `title` field (required for Gemini — see caveat below)

`www.{domain}` counts as the same domain. A subdomain test (e.g., `developers.cloudflare.com`) must match that subdomain — a bare `cloudflare.com` citation is not a pass.

**Scoring:**
```
per_platform_score = round((pass_count / 20) × 10, 1)   # max 10 per platform
external_probes_total = sum of 4 platform scores          # max 40
probe_aifs_score = serp_visibility_60 + external_probes_total  # max 100
```

`serp_visibility_60` = Knowledge Graph (25) + Sitelink Salience (10) + Third-Party Validation (10) + Related Citations (15), all from Phase 2 SERP signals.

**Bands:** 0–35 = invisible | 36–65 = fragmented | 66–85 = recognized | 86–100 = high_fidelity

---

### Platform API Calls

No system prompt. Single user message = the query verbatim. Temperature default. JSON mode off. Web search / grounding enabled.

#### Perplexity

```bash
curl -X POST https://api.perplexity.ai/chat/completions \
  -H "Authorization: Bearer <YOUR_PERPLEXITY_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{"model":"sonar","messages":[{"role":"user","content":"<query>"}],"return_citations":true}'
```

Parse: `choices[0].message.content` (text) + `citations` (array of URLs).

#### OpenAI

```bash
curl -X POST https://api.openai.com/v1/chat/completions \
  -H "Authorization: Bearer <YOUR_OPENAI_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o-mini-search-preview","messages":[{"role":"user","content":"<query>"}]}'
```

Web search is baked into `-search-preview` models — no `tools` array needed. Fall back to `gpt-4o-search-preview` if the mini variant is rejected. Parse `choices[0].message.content`; citations appear inline as URLs.

#### Anthropic

```bash
curl -X POST https://api.anthropic.com/v1/messages \
  -H "x-api-key: <YOUR_ANTHROPIC_API_KEY>" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{"model":"claude-haiku-4-5","max_tokens":1024,"messages":[{"role":"user","content":"<query>"}],"tools":[{"type":"web_search_20250305","name":"web_search"}]}'
```

Parse every `content[].text` block AND any `content[].type == "web_search_tool_result"` blocks plus their `citations[].url`. Model choice for this audit: `claude-haiku-4-5` — citation retrieval is mechanical (domain present/absent), no reasoning floor required.

#### Gemini

```bash
curl -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=<YOUR_GEMINI_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{"contents":[{"parts":[{"text":"<query>"}]}],"tools":[{"google_search":{}}]}'
```

**Critical:** Parse `candidates[0].content.parts[].text` AND `candidates[0].groundingMetadata.groundingChunks[].web.uri` AND `candidates[0].groundingMetadata.groundingChunks[].web.title`.

Gemini grounding URIs are opaque redirects (`vertexaisearch.cloud.google.com/grounding-api-redirect/...`) that hide the real domain. The actual cited domain appears only in `groundingChunks[].web.title` (e.g., `"Fellowship Square Senior Living — fellowshipsquareseniorliving.org"`). A pass-rule that scans only `uri` will systematically undercount Gemini citations.

---

### Query Construction

Each site received exactly 20 queries with this mix:
- **10 brand queries (50%):** `"what is {brand}"`, `"{brand} reviews"`, `"{brand} pricing"`, `"{brand} alternatives"`, `"{brand} vs {competitor}"`, `"{brand} free tier"`, `"{brand} API"`, etc.
- **6 category queries (30%):** `"best {category}"`, `"top {product} for {use case}"`, `"{problem} software"`
- **4 long-tail / comparison queries (20%):** narrow segment comparisons, how-to queries the brand solves, deep documentation queries, pricing comparisons

Queries were generated once per site at audit time and locked in the `_probes.json` sidecar under the `queries` field. Re-runs should use the same query set for day-over-day comparability.

---

## Methodology Notes

**TTFB measurement:** 3 sequential requests to `https://{site}/`, no sleep between them. Discard hit 1 (CDN cold start). Score hit 3. `compensatedMs = rawTTFB_3 − connectTime_3` (both in milliseconds, from curl's `time_starttransfer` and `time_connect` fields on the same request). Pass threshold: `compensatedMs < 200ms`. This approach removes your geographic distance to the PoP without requiring an external calibration target.

**Freshness measurement:** Fetch `https://{site}/sitemap.xml` (follow the `Sitemap:` directive in robots.txt if it points elsewhere). Collect up to 100 `<lastmod>` values from URLs in the sitemap. Count the percentage dated within 90 days of 2026-04-16 (i.e., on or after 2026-01-16). Apply the scoring band in the GEO table above. Sites with no `<lastmod>` tags, an inaccessible sitemap, or a `sitemap.xml` that is only a `<sitemapindex>` with no leaf URL dates score 0.

**AI bot allowlist:** Count distinct bot user-agent strings that appear under an explicit `User-agent:` directive in robots.txt with an `Allow: /` or an absence of any `Disallow:` for that agent. A global `User-agent: *` with no Disallow is not counted as an explicit allowlist. The 17 bot names checked are listed in the S7 section above.

**GEO scoring agents:** Each agent received only `scoring-prompt.md` and one pillar receipt JSON. No SSoT, no session memory, no other context. This minimal-context approach means the score is reproducible without any proprietary project knowledge.

**SERP floor:** All 100 sites in this cohort show `serp_visibility_60 = 10` (sitelink salience only; Knowledge Graph and Related Citations are zero for most). This is a known artifact of the query construction used for the SERP block — the KG signal requires an exact brand-query match that many niche or B2B sites don't trigger. The meaningful signal in `probe_aifs_score` is the `external_probes_total` (max 40). Do not treat the SERP floor as evidence that these sites have no brand SERP presence.

---

## Known Limitations and Caveats

**SERP floor artifact:** As noted above, `serp_visibility_60` is effectively floored at 10 for this cohort due to query construction. This does not reflect actual SERP invisibility — it reflects that the KG/related-citations signals require brand queries tuned to each site's exact registered entity name, which was not done at this scale. The probe scores (external_probes_total) are the clean signal.

**Gemini grounding redirect issue:** Gemini grounding URIs are opaque. The pass rule scans `groundingChunks[].web.title`, not `uri`. If you re-run without this fix, you will systematically undercount Gemini passes. This is documented in the API call shape above.

**GEO vs. ARM paradox:** High-authority sites (NASA, NIH, academic institutions) frequently score low on GEO because they lack explicit AI-facing infrastructure signals (no llms.txt, no MCP server, few JSON-LD types, no AI bot allowlist in robots.txt). These same sites score moderate ARM because AI engines already know them from training data and traditional crawling. Low GEO + moderate ARM reflects an "authority floor" that pre-existing reputation provides even without GEO investment. It is not a scoring error.

**Freshness = 0 means sitemap, not necessarily content:** A Freshness score of 0 means the site's sitemap has no `<lastmod>` tags within 90 days, or has no `<lastmod>` tags at all, or the sitemap is inaccessible. It does not prove the site's content is stale — many high-traffic sites simply do not populate `<lastmod>` accurately. Verify by checking the sitemap directly before concluding content freshness is a problem.

**Scores reflect site state on 2026-04-16:** Robots.txt rules, sitemap contents, JSON-LD markup, llms.txt presence, and HTTP headers change over time. A site that scores 0 on S2 today may have added llms.txt by the time you run Phase 1.

**Haiku for Anthropic probes is audit-scoped only:** `claude-haiku-4-5` was used for the Anthropic Probe ARM calls because citation retrieval (domain present/absent in response) has no reasoning floor and Haiku is 3× cheaper than Sonnet for identical outcomes on this task. Do not read this as a general recommendation; it is specific to this mechanical pass/fail probe workload.

**Probe ARM drift:** AI retrieval indexes change. A site that scored 3/20 on Gemini on 2026-04-16 may score differently next week. The scores in this report are a snapshot, not a permanent characterization. Re-run Phase 3 to get current numbers.

---

## How to Dispute a Score

If you believe a site was scored incorrectly:

1. Run the exact curl commands in Phase 1 against that site today. Compare your raw outputs to the receipt JSON for that site in the `audit-receipts-v3/` directory.

2. Apply the scoring rubric from Phase 2 dimension by dimension. If you reach a different score for a dimension, the receipt JSON contains the evidence string that was used — compare it directly.

3. For ARM disputes: re-run the 20 queries from the site's `_probes.json` sidecar (available under the `queries` field) against each platform using the exact API call shapes above. Count passes. If your count differs, the most likely causes are: (a) Gemini title-scan missing (see Gemini note above), (b) the AI system updated its retrieval index between runs, or (c) a query interpretation difference. All three are expected sources of variance.

4. GEO scores are fully deterministic from the receipt. If the receipt evidence matches the rubric, the score is correct as of the audit date. If the site has changed since 2026-04-16, run a fresh Phase 1 to get current evidence.

Note: scores reflect site and AI-engine state as of 2026-04-16. Sites change; AI indexes change. A dispute is most useful if it identifies a scoring logic error (wrong rubric application) rather than a site improvement that occurred after the audit date.

---

## Files in This Package

| File | Contents |
|---|---|
| `geo-survey-mj6pew.html` | Full interactive results page — sortable table, per-site dimension breakdown, ARM band distribution |
| `reproduce_this.md` | This file — methodology, curl commands, scoring rubric, caveats |
| `AUDIT-SUMMARY-2026-04-16-100site.json` | Master scorecard JSON with all 100 sites, all dimensions, all probe scores |
| `NNN_{site_slug}.json` | Per-site pillar receipt (100 files) — raw signal evidence for each of the 8 pillars |
| `NNN_{site_slug}_extended.json` | Per-site extended sidecar (100 files) — GEO 9-dimension scores with evidence strings |
| `NNN_{site_slug}_probes.json` | Per-site Probe ARM sidecar (100 files) — all 80 probe pass/fail results with per-query reasons |
| `MANIFEST-2026-04-16-100site.json` | SHA-256 hashes of all files — verify integrity before reproducing |

All files are in `C:\Users\ROBER\projects\audit-receipts-v3\` (full 100-site run, 2026-04-16).

---

*Audit date: 2026-04-16. 100 sites, 29 industries. Methodology version: `v1-2026-04-16`. Query sets locked in `_probes.json` sidecars for day-over-day comparability.*
