Measuring the Unmeasurable: Pre/Post Experimentation for Brand Visibility in LLMs
When a customer changes their site content, they've always been able to measure the impact on human visitors — cookies, cohorts, clicks, conversions. Now imagine running the same measurement against ChatGPT. Or Perplexity. Or Gemini. You immediately hit a wall: you cannot cookie a bot. The cohort-assignment foundation that every classical A/B testing framework is built on collapses the moment your "user" is an LLM crawler that arrives, scrapes, and vanishes with no persistent identity between visits.
This talk is about what you build when that foundation is gone.
The LLMO Experimentation Engine inside Adobe LLM Optimizer gives enterprise brands a repeatable way to measure whether their AI-search optimizations — improving readability, restructuring pages, fixing crawler-facing rendering gaps — actually translate into more citations, better sentiment, or higher mention frequency on platforms like ChatGPT and Perplexity. The methodology is temporal pre/post: capture a baseline of how your brand is cited across LLM platforms over a time window, ship the change through Optimize at Edge (our edge content-optimization service), capture a matching post-deployment window, compute the delta. No cohorts, no cookies, no randomization — prompts are replicated many times per window to absorb the non-determinism of LLM outputs, and the deltas are surfaced with clear signal-vs-noise framing.
Pre/post comes with a known confound: ambient drift in LLM behavior — platform model updates, training-data refreshes, provider-side product changes — between the baseline window and the post window. We'll cover how we think about that honestly, what the engine measures, what it deliberately does not claim to measure, and when the signal is strong enough to attribute to the customer's change versus the background.
The first concrete application targets a problem every JavaScript-heavy brand site has: the origin serves near-empty HTML to AI crawlers, and whatever optimization the content team writes never reaches the LLMs. Using Optimize at Edge as the edge-rendering middleware that intercepts AI user-agent traffic and serves fully structured content, the hypothesis is a 15–30% lift in ChatGPT citation rates, validable within roughly four weeks.
What you'll take away:
- The end-to-end workflow — how edit → optimize → deploy → measure plays out across three systems (SpaceCat, DRS, and Optimize at Edge) and why each boundary is where it is.
- The experimentation layer in depth — deferred baseline collection (no LLM spend on opportunities that never deploy), prompt replication to bound variance, per-experiment data isolation, and the concurrency primitives that keep cron-driven
execution honest. - How partners can extend this for their own AEM and Edge Delivery projects — the engine is not specific to Adobe-generated optimizations; any content transformation you can ship through the edge layer can be measured through this engine.