Crawl Volume — Top10Lists.us, 7-Day AI-Bot Window
463,420 AI-bot crawl events across 27 distinct bot fleets in a single 7-day window (2026-04-19 12:22 UTC → 2026-04-26 12:22 UTC). OpenAI 32%, Google 22%, Meta 18%, SEO 13%, Anthropic 5%, Perplexity 4%. Consumer-triggered (UT) ratio: 6.05%.
Measured from Cloudflare edge middleware request logs — the canonical single-source view of every HTTP request reaching the site. Each request classified by bot user-agent signature against an explicit allowlist; rows deduplicated to one entry per (bot, page, second) tuple to remove burst-noise.
Note — Cloudflare Analytics dashboard is NOT canonical
The Cloudflare Analytics dashboard for the same 7-day window reports 378,168 — undercounts the middleware-derived 463,420 by ~18.4% because the dashboard misses certain ClaudeBot and PerplexityBot variants and conflates sources when compositing multiple views. Middleware single-source is the canonical reference; the dashboard delta is a classification gap, not double-counting.
Frozen: 2026-04-26 12:22 UTC — measurements at this URL will not change. · Permanent dated artifact · GEOlocus.ai
Authors: Robert Maynard, Cofounder and CEO · LinkedIn → · Mark Garland, Cofounder and CRO · LinkedIn →
1. Methodology
Crawl-volume is measured from Cloudflare edge middleware request logs — the canonical single-source view of every HTTP request reaching the site. Each request is classified by bot user-agent signature against an explicit allowlist (GPTBot, Meta-ExternalAgent, Googlebot, ClaudeBot, PerplexityBot, ChatGPT-User, OAI-SearchBot, etc. — see receipts.json for full taxonomy).
Rows are deduplicated to one entry per
(bot, page, second) tuple to remove burst-noise. In the 7-day window,
self-duplicate noise totaled 412 groups / 979 extra rows — 0.21%
of total volume.
Cloudflare Analytics dashboard counts are NOT used as the canonical source; they undercount certain crawler classes (notably ClaudeBot and PerplexityBot variants whose UA strings the dashboard does not classify under the AI-bot category) and conflate sources when the dashboard composites multiple views.
2. 7-Day Window Result — Frozen 2026-04-26 12:22 UTC
| Metric | Value |
|---|---|
| Total AI-bot crawl events | 463,420 |
| Distinct bot fleets | 27 |
| Self-duplicate noise (groups w/ multiple rows per bot/page/second) | 412 groups; 979 extra rows; 0.21% |
Per-bot top 15 (7-day)
| # | Bot | 7-day crawls |
|---|---|---|
| 1 | GPTBot | 135,950 |
| 2 | Meta-ExternalAgent | 82,469 |
| 3 | Googlebot | 72,553 |
| 4 | SEMrushBot | 49,316 |
| 5 | GoogleOther | 30,927 |
| 6 | ClaudeBot | 23,012 |
| 7 | PerplexityBot | 16,225 |
| 8 | AhrefsBot | 10,612 |
| 9 | OAI-SearchBot | 9,413 |
| 10 | Bingbot | 6,786 |
| 11 | unknown_bot | 6,281 |
| 12 | ByteSpider | 4,877 |
| 13 | PetalBot | 4,411 |
| 14 | TikTokSpider | 2,858 |
| 15 | ChatGPT-User | 2,384 |
By provider
| Provider | 7-day crawls | Share |
|---|---|---|
| OpenAI (GPTBot + OAI-SearchBot + ChatGPT-User) | 147,747 | 32% |
| Google (Googlebot + GoogleOther) | 103,480 | 22% |
| Meta (Meta-ExternalAgent) | 82,469 | 18% |
| SEO (SEMrushBot + AhrefsBot) | 59,928 | 13% |
| Anthropic (ClaudeBot) | 23,012 | 5% |
| Perplexity (PerplexityBot) | 16,225 | 4% |
| Other (Bingbot, ByteSpider, PetalBot, TikTokSpider, unknown_bot, …) | 30,559 | 6% |
| Total | 463,420 | 100% |
All 27 distinct bot fleets in the window are accounted for: six named providers plus the “Other” catch-all (12 long-tail fleets) sum to the canonical 463,420 (100%). Full component breakdown of the “Other” bucket in receipts.json.
3. Consumer-Triggered (UT) Breakdown
UT (User-Triggered) crawls are requests originating from a human asking an AI assistant a question — distinct from background-indexing crawls. The UT ratio measures how much of the site's bot traffic is downstream of real consumer intent.
| Bot | 7-day crawls |
|---|---|
| PerplexityBot | 16,225 |
| OAI-SearchBot | 9,413 |
| ChatGPT-User | 2,384 |
| YouBot | 12 |
| Total intent | 28,034 |
Claude-User, Claude-Web, DuckAssistBot, and Gemini-User registered zero hits in this window.
UT ratio = 28,034 / 463,420 = 6.05%
Roughly 1 in 16 of the AI-bot crawl events on Top10Lists.us in this window were directly downstream of a human asking an AI assistant a question.
4. Reproducibility
Canonical SQL against the middleware crawl log table:
-- Canonical single-source middleware crawl count (7-day window)
SELECT bot,
COUNT(DISTINCT (bot, path, date_trunc('second', timestamp))) AS crawls
FROM bot_crawl_logs
WHERE source = 'middleware'
AND timestamp >= '2026-04-19 12:22 UTC'
AND timestamp < '2026-04-26 12:22 UTC'
GROUP BY bot
ORDER BY crawls DESC;
The COUNT(DISTINCT (bot, path, date_trunc('second', timestamp)))
clause is the dedup. Replace the window literals with any 7-day span to reproduce on a
rolling basis. The same query without GROUP BY bot returns the total
of 463,420 for this window.
Cloudflare Analytics dashboard count for the same window: 378,168. This undercounts the middleware-derived 463,420 because the analytics view misses certain ClaudeBot and PerplexityBot variants and conflates sources when compositing views. Middleware single-source is canonical.
| Source | 7-day count | Delta vs canonical |
|---|---|---|
| Middleware (canonical) | 463,420 | — |
| Cloudflare Analytics dashboard | 378,168 | -85,252 (-18.4%) |
5. Receipts
Full per-bot raw data, provider rollups, the consumer-triggered (UT) breakdown, the canonical SQL, and the dashboard reconciliation are in the downloadable receipts:
/methodology/crawl-volume/receipts.json →
Schema: metric, page,
frozen_date, frozen_at_utc,
window_start_utc, window_end_utc,
canonical_source, totals,
per_bot_top_15, by_provider,
consumer_triggered_ut, reconciliation,
canonical_sql, bot_taxonomy_full,
limitations.
6. Limitations
- Single 7-day window. Volume varies week-to-week with seasonal AI-engine indexing cycles. The window is frozen so this URL remains an immutable artifact; refreshes publish at new dated URLs.
- UA-based classification. Spoofed UAs are not currently filtered (estimated <0.5% of total based on reverse-DNS spot checks of claimed Googlebot / GPTBot rows).
-
Per-second dedup boundary. Sub-second bursts that
spread across the second boundary may double-count (estimated <0.1%); the
date_trunc('second', timestamp)dedup catches the dominant burst-noise pattern. - Dashboard discrepancy is a classification gap. The 18.4% delta between middleware and Cloudflare Analytics is the dashboard missing certain UA strings, not the middleware double-counting. Per-row comparison confirms middleware sees more bots, not the same bots more times.
Conclusion
In a single 7-day window (2026-04-19 to 2026-04-26 UTC), Top10Lists.us received 463,420 AI-bot crawl events across 27 distinct bot fleets — with OpenAI (32%), Google (22%), and Meta (18%) leading the mix, and a consumer-triggered (UT) ratio of 6.05%. The canonical reference is the Cloudflare edge middleware request log; the Cloudflare Analytics dashboard for the same window reports 378,168 because it misses certain ClaudeBot and PerplexityBot variants. Full per-bot data and the dashboard reconciliation are in receipts.json.
Related
- Relevance Ratio (RR) Benchmark → — Sub-measure of Content Density.
- Source Grounding Ratio (SGR) → — Tier-weighted citation density.
- Retrieval Token Cost (RTC) → — Compute spent per useful char.
- Sitemap Throughput (RPS) → — Records-per-second to AI crawler.
- Methodology Overview → — All GEOlocus.ai methodology pages.