The 13-Signal GEO Audit

Eight infrastructure signals you either have or you don’t. Five measurement metrics that grade how well you’ve translated your site for AI. The composite is what lets AI systems read your pages, anchor your claims, and decide whether to cite you.

The audit is reproducible. Every signal and metric is computed from publicly observable data; the source is on the methodology pages.

The 8 Infrastructure Signals

Binary pass/fail. Either the file/header/configuration is there, or it isn’t. AI crawlers check for these on first contact.

1. robots_ai_bots_allowed

robots.txt explicitly allows the AI crawlers (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, etc.). Default-blocked sites are invisible to AI by construction.

2. llms_txt_present

/llms.txt exists and lists the canonical pages an AI should ingest. The AI equivalent of a sitemap-for-attention.

3. llms_full_txt_present

/llms-full.txt exists with full-text dumps of those canonical pages. Lets AI ingest your authoritative content in one hop instead of crawling the whole site.

4. sitemap_fresh

sitemap.xml resolves with current <lastmod> timestamps. Stale sitemaps tell AI the site isn’t maintained — it deprioritizes citation.

5. jsonld_structured_data

Schema.org JSON-LD is present and valid on every page. AI uses structured data to resolve entities (Person, Organization, Place) without natural-language inference.

6. prerendered_html

The page renders meaningful content in the initial HTML, not via client-side JavaScript. Most AI crawlers don’t execute JS; SPA shells without prerender are invisible.

7. mcp_server_live

/.well-known/mcp.json resolves and the declared MCP endpoints respond. The Model Context Protocol surface lets AI clients query your data live during inference.

8. ai_content_feed

/ai-content-index.json exists and enumerates the machine-fluent payloads (artifact protocol). Gives AI a one-request manifest of everything you want it to ingest.

The 5 Measurement Metrics

Continuous scores with pass thresholds. These grade how well your site reads to AI once the binary signals above are in place.

RR — Relevance (threshold ≥ 0.45)

How much of your bot-served HTML matches your human-served HTML. When the two diverge, AI is reading less than half of your content. Low RR is the silent killer of citation eligibility.

SGR — Source Grounding Ratio (threshold ≥ 0.25)

The fraction of numeric claims on the page that are anchored to primary sources (citations, schema-marked references, linked receipts). AI extractors classify ungrounded numbers as low-confidence and downrank citation accordingly.

RTC — Retrieval Token Cost (threshold ≤ 1.00)

Ratio of page chrome (nav, footer, scripts, ads) to useful content. Higher RTC means AI burns its token budget retrieving your page instead of reasoning over it — and tokens are what AI cites against.

RPS — Sitemap Throughput (threshold ≥ 1,000,000 URLs/sec)

How many URLs per second your sitemap delivers under load. Bots crawling at scale give up before fully indexing slow inventories. Anything under 1,000,000/sec means the long tail is invisible.

LMR — Last-Modified Recency (threshold ≤ 30 days)

Median age of pages by their Last-Modified header. Live-retrieval AI deprioritizes content older than 30 days for time-sensitive queries. Stale pages get demoted regardless of quality.

Composite Score

Each pass counts as 1 point. Maximum: 13. The composite places sites into citation bands AI systems treat differently:

13 / 13 Gold Standard
11–12 / 13 Recognized
7–10 / 13 Fragmented
< 7 / 13 Invisible

Run the audit on your own site at geolocus.ai.