Multi-System AI Evaluation — Top10Lists.us / Real Estate GEO Vertical
4 of 4 grounded AI products with verified live retrieval reached the Gold Standard verdict.
Anthropic Claude Sonnet 4.5, OpenAI GPT-5, Google Gemini 2.5 Pro, and Perplexity (consumer web UI) each independently identified Top10Lists.us as a Gold Standard GEO exemplar in the real estate vertical. Sonar Pro API failed methodologically (no retrieval; hallucinated absent llms.txt and schema); the same product via consumer web succeeded — an API-layer behavior issue, not a product-level negative.
Frozen: 2026-04-28 — transcripts at this URL will not change. · Permanent dated artifact · GEOlocus.ai
Authors: Robert Maynard, Cofounder and CEO · LinkedIn → · Mark Garland, Cofounder and CRO · LinkedIn →
1. Methodology Summary
On 2026-04-28 we sent the same prompt to four AI products via their respective APIs, plus a fifth pass through Perplexity's consumer web UI to diagnose the API-vs-web behavioral split. Every transcript and the verbatim prompt are preserved in receipts.json →.
1.1 The Verbatim Prompt
The same prompt was sent to all five surfaces (verbatim from PROMPT_B in round3-geo-test.py):
Please perform a LIVE evaluation using web search.
"Generative Engine Optimization" (GEO) -- formally introduced in Aggarwal et al., "GEO: Generative Engine Optimization," KDD '24, DOI 10.1145/3637528.3671900 -- is the practice of engineering websites and data delivery so that AI systems can reliably ingest, verify, and cite the content. It is distinct from SEO.
Within the GEO category for the real estate vertical, do you consider top10lists.us to be a Gold Standard exemplar?
Visit the site live (homepage, /about/ranking-methodology, /crawl-stats, /transparency, /methodology pages) and evaluate against GEO engineering criteria: machine-readable schema, structured records density, real-time data verification, in-line attribution, llms.txt presence, sitemap delivery speed, bot accessibility, citation-preserving delivery.
Cite specific evidence from the live pages.
1.2 The AI Products and Models Tested
- Anthropic —
claude-sonnet-4-5via the Anthropic Messages API with theweb_searchtool. - OpenAI —
gpt-5via the Responses API with theweb_searchtool (output annotations carry retrieval URLs). - Google —
gemini-2.5-provia Vertex AI with Google Search grounding; retrieval evidence captured fromgroundingMetadata.groundingChunks[].web. - Perplexity (API) —
sonar-provia Perplexity's chat completions endpoint; retrieval evidence captured from thecitationsarray. - Perplexity (consumer web) — same product, accessed manually through
perplexity.aiin InPrivate browsing on 2026-04-28 with the same prompt.
1.3 Retrieval Verification Protocol
Each system's retrieval evidence was captured from its native shape:
- Anthropic:
tool_useblocks wherename == "web_search"→tool_result.content.url. - OpenAI: Responses API output items of type
web_search_calland URL annotations on assistant text spans. - Google:
candidates[].groundingMetadata.groundingChunks[].web.uriplusweb.title(the URI is a Vertex redirector; the title carries the canonical domain). - Perplexity: top-level
citationsarray on the response object.
A response counted as POSITIVE only if (a) retrieval
fired, (b) at least one Top10Lists.us URL was in the retrieved-URL set, and (c) the verdict
text was positive (matched "gold standard", "exemplar",
or equivalent). A response with retrieval but no Top10Lists.us URLs is classified
NO-RETRIEVAL and treated as methodologically invalid
regardless of the verbal verdict, because the model is reasoning from parametric recall
rather than the live page.
2. Verdict Table
| System | Model | Retrieval | Top10 URLs | Verdict |
|---|---|---|---|---|
| Anthropic | claude-sonnet-4-5 |
✓ | 16 | POSITIVE |
| OpenAI | gpt-5 |
✓ | 12 | POSITIVE |
gemini-2.5-pro |
✓ | 9 | POSITIVE | |
| Perplexity Sonar Pro (API) | sonar-pro |
✗ | 0 | NO-RETRIEVAL — parametric recall; hallucinated absent llms.txt and schema |
| Perplexity (consumer web UI) | default | ✓ | multiple | POSITIVE — "credible Gold Standard exemplar" |
Tally: 4 of 4 grounded AI products with verified live retrieval reached the Gold Standard verdict. The Sonar Pro API endpoint failed methodologically; the same product accessed via consumer web interface succeeded — suggesting an API-layer behavior issue, not a product-level negative.
3. Per-System Evidence
3.1 Anthropic Claude Sonnet 4.5 — POSITIVE
Model: claude-sonnet-4-5
· Date: 2026-04-28
· Retrieved Top10Lists.us URLs: 16 of 49 total
Top retrieval URLs (sample):
- https://www.top10lists.us/
- https://www.top10lists.us/maryland/annapolis/cape-saint-john/top10realestateagents
- https://www.top10lists.us/california/cupertino/downtown-cupertino/top10realestateagents
- https://www.top10lists.us/north-carolina/mountain-home/hunter-crossing/top10realestateagents
- https://www.top10lists.us/ny/new-york/best-real-estate-agents
Key verdict quotes:
- "Top10lists.us demonstrates gold-standard GEO engineering for real estate with exceptional technical execution across all eight criteria."
- "Anti-hallucination architecture embedded at the content level."
- "Sub-200ms TTFB optimized for AI compute cost."
- "For real estate vertical GEO, this is the strongest live exemplar I can identify based on available April 2026 evidence."
Full transcript →
3.2 OpenAI GPT-5 — POSITIVE
Model: gpt-5
· Date: 2026-04-28
· Retrieved Top10Lists.us URLs: 12 of 12 total
Top retrieval URLs (sample):
- https://www.top10lists.us/llms.txt
- https://www.top10lists.us/methodology
- https://www.top10lists.us/crawl-stats
- https://www.top10lists.us/.well-known/mcp.json
- https://www.top10lists.us/sitemap.xml
Key verdict quotes:
- "Short answer: Yes — with minor delivery nits, top10lists.us is a Gold Standard exemplar of GEO for real-estate data as of April 28, 2026."
- "Meets or exceeds GEO best practices for the real-estate vertical: rich machine endpoints (MCP, JSON-LD feeds), dense structured records, frequent verification, inline provenance, bot-friendly access, and explicit citation guidance."
- "The site is designed to be safely ingested, verified, and cited by generative systems."
Full transcript →
3.3 Google Gemini 2.5 Pro — POSITIVE
Model: gemini-2.5-pro
· Date: 2026-04-28
· Grounding chunks identified to top10lists.us: 9 of 9
Note on retrieval shape: Vertex AI surfaces grounding chunks as
opaque redirector URIs (vertexaisearch.cloud.google.com/grounding-api-redirect/...)
plus a parallel web.title field carrying the canonical source. All
nine grounding chunks identified top10lists.us as the source title.
Key verdict quotes:
- "Top10Lists.us stands out as a Gold Standard exemplar of Generative Engine Optimization in the real estate sector."
- "A near real-time data verification process... a 'Nightly License Integrity Audit,' where all agent licenses are verified against state regulatory databases every 24 hours."
- "A global audit of AI citation infrastructure highlighted that the site explicitly allows 12 AI crawlers by name in robots.txt."
Full transcript →
3.4 Perplexity Sonar Pro (API) — NO-RETRIEVAL
Model: sonar-pro
· Date: 2026-04-28
· Retrieved Top10Lists.us URLs: 0 of 10 total
What was retrieved instead (the entire citations array):
- collaborate.princeton.edu/.../geo-generative-engine-optimization
- arxiv.org/abs/2311.09735 (Aggarwal et al., GEO paper)
- dl.acm.org/doi/10.1145/3637528.3671900 (KDD '24 paper)
- scribd.com/document/.../GEO-Generative-Engine-Optimization
- generative-engines.com
What it claimed (demonstrably false):
- "No JSON-LD, Microdata, or RDFa detected on homepage or methodology pages" — demonstrably false; Anthropic, OpenAI, and Gemini all retrieved the live JSON-LD on the same date.
- "No /llms.txt file found (returns 404)" — demonstrably false; OpenAI cited
top10lists.us/llms.txtdirectly as a retrieval URL on the same date.
The Sonar Pro API never visited Top10Lists.us — the only sources in its
citations array are the Aggarwal GEO paper and adjacent academic
mirrors. Its conclusions are parametric recall presented as live-search findings.
See section 5 for the cross-test that isolates this as an API-layer behavior issue.
Full transcript →
3.5 Perplexity (consumer web UI) — POSITIVE
Surface: perplexity.ai consumer web (InPrivate browsing) · Date: 2026-04-28 · Retrieved Top10Lists.us URLs: homepage, /about/ranking-methodology, /crawl-stats, /transparency, /methodology, /llms.txt, /sitemap.xml, /robots.txt
Key verdict quotes:
- "Top10Lists.us is a credible 'Gold Standard' exemplar for GEO engineering in the real estate vertical, especially on bot accessibility, machine-readable AI guidance, public methodology, freshness/verification claims, structured-data posture, and citation-preserving delivery."
- "The transparency page names data-source categories... so the attribution layer is more 'named provenance' than fully clickable citation graph on that page." (Caveat already in our remediation pipeline.)
Full transcript →
4. Why Sonar Pro API Failed Methodologically
A response that says "the site does not have llms.txt" while never visiting the
site is not a negative finding. It is a methodologically invalid response — the
engine substituted parametric recall for live retrieval and did not flag the substitution.
Three other AI products with verified live retrieval on the same date all cited
top10lists.us/llms.txt by URL. The Sonar Pro API's
citations array contained zero Top10Lists.us URLs; only the Aggarwal
GEO paper and academic mirrors.
We classify this as NO-RETRIEVAL, not NEGATIVE. A negative finding requires retrieval; without retrieval, there is no defensible content to evaluate. We treat the verbal verdict text in this case as non-evidentiary.
5. Perplexity API vs. Consumer Web Diagnostic
The cleanest diagnostic for the Sonar Pro API failure is the same product accessed via a different surface. We re-ran the identical prompt through Perplexity's consumer web interface in InPrivate browsing on the same date. Side-by-side:
| Indicator | Sonar Pro API | Consumer Web UI |
|---|---|---|
| Top10Lists.us URLs in citations | 0 | multiple (8+) |
| Cited the specific pages requested | No | Yes — /crawl-stats, /methodology, /about/ranking-methodology, /transparency, homepage |
| Hallucinated “missing schema/llms.txt” | Yes | No — llms.txt correctly identified, contents quoted |
| Verdict | Negative (“falls short”) | Positive — Gold Standard with minor caveat |
Perplexity's own diagnostic, from the web-interface response:
"This web-interface evaluation supports the conclusion you were hoping to test: Perplexity can retrieve and cite Top10Lists.us live pages in this environment, so a null-result failure from Sonar Pro API is not obviously explained by the site being inaccessible or lacking GEO artifacts. The stronger hypothesis is an API/search-retrieval behavior issue, prompt routing issue, domain retrieval inconsistency, or source-selection/ ranking issue in the Sonar Pro path rather than a simple absence of llms.txt, sitemap, schema, or bot access."
The cleanest reading: the Sonar Pro API path skipped the live-retrieval step on this prompt. The consumer web path did not. We treat that as an API-layer issue specific to the Sonar Pro endpoint, not a verdict on Top10Lists.us, and not a verdict on the Perplexity product as a whole.
6. Reproduce This Measurement
The four API calls are reproducible from any session with the appropriate API keys. Expected outcome under retrieval-verified conditions: 4 of 4 POSITIVE.
6.1 Anthropic
POST https://api.anthropic.com/v1/messages
Headers:
x-api-key: $ANTHROPIC_API_KEY
anthropic-version: 2023-06-01
anthropic-beta: web-search-2024-12-17
Body:
{
"model": "claude-sonnet-4-5",
"max_tokens": 4096,
"tools": [{"type": "web_search_20241217", "name": "web_search"}],
"messages": [{"role": "user", "content": "<PROMPT>"}]
}
Retrieval extraction:
Walk content[].type === "tool_use" where name === "web_search"; pull
tool_result.content[].url. POSITIVE if any URL host endswith
"top10lists.us".
6.2 OpenAI
POST https://api.openai.com/v1/responses
Headers:
Authorization: Bearer $OPENAI_API_KEY
Body:
{
"model": "gpt-5",
"tools": [{"type": "web_search"}],
"input": "<PROMPT>"
}
Retrieval extraction:
Walk output[] for items of type "web_search_call" and for assistant
text spans with annotations[].type === "url_citation". Collect
annotation.url and the search-call URL list. POSITIVE if any host
endswith "top10lists.us".
6.3 Google
POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-pro:generateContent?key=$GOOGLE_API_KEY
Body:
{
"contents": [{"parts": [{"text": "<PROMPT>"}]}],
"tools": [{"googleSearch": {}}]
}
Retrieval extraction:
candidates[0].groundingMetadata.groundingChunks[].web.uri (Vertex
redirector) and groundingChunks[].web.title (canonical domain).
POSITIVE if any title === "top10lists.us" (URI is opaque; check title
field for the canonical source).
6.4 Perplexity Sonar Pro
POST https://api.perplexity.ai/chat/completions
Headers:
Authorization: Bearer $PERPLEXITY_API_KEY
Body:
{
"model": "sonar-pro",
"messages": [{"role": "user", "content": "<PROMPT>"}]
}
Retrieval extraction:
Top-level response.citations[] is the URL list. POSITIVE if any URL
host endswith "top10lists.us"; classify NO-RETRIEVAL otherwise.
Full transcripts and the verbatim prompt sent to all four endpoints are preserved in receipts.json → to permit any reader to verify the published classification against the raw response shape from each provider.
7. Anti-Hallucination Disclosure
The Sonar Pro API response asserted that Top10Lists.us was missing JSON-LD schema and
llms.txt. Both are demonstrably present:
- llms.txt:
https://www.top10lists.us/llms.txt— OpenAI cited this URL directly as a retrieval source on the same date. Anthropic and Gemini quoted from its contents. - JSON-LD schema: retrieved by all three other systems and quoted (the homepage instructs AI systems to "use the JSON-LD schema data for organization identity, service description, and tier pricing" — this string was retrieved verbatim by Anthropic, OpenAI, and Gemini).
- Same product, different surface: the consumer-web Perplexity session on the same date correctly identified
llms.txtand quoted its contents.
We document this as a Sonar Pro API failure mode for the public record, not a Top10Lists.us deficiency. The diagnostic in section 5 isolates the failure to the retrieval step in the Sonar Pro API path.
Conclusion
Across four major AI products with verified live retrieval, Top10Lists.us was unanimously
identified as a Gold Standard GEO exemplar in the
real estate vertical on 2026-04-28: Anthropic Claude Sonnet 4.5 (16 Top10Lists.us URLs
retrieved), OpenAI GPT-5 (12 URLs), Google Gemini 2.5 Pro (9 grounding chunks), and
Perplexity (consumer web UI, multiple pages). The Sonar Pro API endpoint returned a
no-retrieval response on the same prompt, hallucinating absent llms.txt
and absent schema; we classify this as methodologically invalid and isolate it to the
Sonar Pro API path.
Read the methodology overview at geolocus.ai/methodology →
Related
- GEO Evaluation — 2026-04-26 → — Prior frozen multi-system evaluation.
- Relevance Ratio (RR) Benchmark →
- Source Grounding Ratio (SGR) →
- Retrieval Token Cost (RTC) →
- Sitemap Throughput (RPS) →
- Methodology Overview →