You check rankings for 200 phrases and think you have a picture of visibility. Meanwhile, Google internally breaks each of those queries into several to a dozen sub-queries that you never see in any tool. AI doesn't assess whether you're in position 3. It assesses whether you close the entire user journey — including questions they haven't asked yet.
Classic rank tracking was designed for a world where one query = one result = one position. That world no longer exists. In AI Overviews, AI Mode, Perplexity, ChatGPT Search, and Gemini, the answer is synthesized from multiple sources based on multiple internal queries generated by the model. If you only monitor head keywords, you're measuring the shadow of the phenomenon, not the phenomenon itself. Below, I show how — based on specific Google patents — to build a representative set of prompts that actually covers the answer surface.

TL;DR
- Google doesn't search for a phrase — it decomposes intent. Patents describe explicit decomposition and the generation of implicit sub-queries and query fan-out (US20250355958A1, EP4713828A1).
- Monitoring must cover 6 types of prompts: head, long-tail questions, comparison, shopping, local, and follow-up / implicit sub-queries.
- The metric shifts from position to completeness. Answer Completeness and Query Reformulation Fit determine whether AI considers your content a source that closes the session.
- You're building not a list of words, but a "prompt space." Representative, not exhaustive — mapping entire intent families around a topic.

Context: why position stopped being the unit of measurement
In a generative model, the user doesn't scan ten blue links. They get one answer stitched together from fragments of multiple pages. This means the question "am I visible for phrase X" is poorly framed. The right question is: "was my content selected as material for synthesizing an answer to the family of intents around X — and does it close that intent well enough that the user doesn't need to search further?"
Two patent concepts are key here. Query Reformulation Fit (QRF) describes how well content matches reformulated and related variants of the original query. Answer Completeness describes how completely the answer satisfies the intent — including aspects the user didn't articulate. Together, they shift the center of gravity from "being on the list" to "being the definitive source for the entire session."

How it works: anatomy of query decomposition
Before the model generates an answer, it performs work you don't see in the SERP. Google patents describe this process quite precisely. Let's break it into stages.
1. Explicit decomposition — overt query breakdown
A complex query is divided into components that can be addressed separately. "Best mirrorless camera for travel under $2000" is in practice: type selection, use case criterion, and budget constraint. Each component can be served by a different source.
2. Implicit sub-queries — unspoken sub-questions
This is the most important and most underestimated mechanism. The system generates questions the user didn't ask, but will likely need (US20250355958A1). After "which mirrorless camera," internally appear: "which lens to start with," "does it have stabilization," "what are the downsides," "where to watch reviews." Content that only answers the question asked explicitly loses to content that anticipates these hidden branches.
3. Query fan-out — sub-query array
In conversational mode, one query triggers multiple parallel sub-queries that gather material before synthesis (EP4713828A1). The broader and more coherent the fan covered by your content cluster, the greater the chance you'll be selected in multiple "branches" simultaneously.
4. Material selection by meaning, not by phrase
Selection occurs through embedding similarity and learning from query logs and candidate refinement queries (EP4121843B1, US20230127706A1). The system knows which continuations historically follow a given query, and rewards content aligned with those paths, even without exact keyword matching.
Practical consequence: if your monitoring doesn't include implicit sub-queries and fan-out, you're measuring the entrance to the funnel while ignoring its entire depth. That's where citability is decided.

Six types of prompts: constructing a monitoring set
A representative set is not thousands of phrases, but deliberately selected layers covering different phases and shapes of intent. For each priority topic (cluster), build a complete set of the types below.
Head queries — explicit intent, entry point
- Role: broad queries representing the main need. This is the topic's apex and the first presence test.
- Example: "best mirrorless camera 2026"
- What you measure: whether you appear at all as a source in AIO / AI Mode for the topic core.
Long-tail questions — explicit deconstruction
- Role: detailed questions answering specific, articulated sub-queries.
- Example: "which mirrorless camera for landscape photography with budget under $2000"
- What you measure: content's ability for precise, narrow answers — depth, not just breadth.
Comparison prompts — contextual refinement
- Role: comparisons of products, services, or concepts; decision phase.
- Example: "Sony a7 IV vs Fujifilm X-T5 for travel"
- What you measure: whether content provides material for comparison that AI can synthesize (tables, criteria, differences).
Shopping prompts — transactional intent
- Role: product features, availability, price, purchase decision.
- Example: "where to buy Nikon Z fc with fast shipping"
- What you measure: effectiveness in commercial AIO — product data, availability, transactional signals.
Local prompts — geographic context
- Role: queries with explicit or implicit local intent.
- Example: "camera repair Mokotów reviews"
- What you measure: local presence and accuracy in generative answers.
Follow-up prompts / implicit sub-queries — journey simulation
- Role: the most important and innovative layer. Simulates subsequent and unspoken user questions.
- Example: after "best restaurants in Wrocław" → "which have vegan options near the Market Square?" → "which have a terrace and accept dogs?" → "what are the prices at those recommended?"
- What you measure: Answer Completeness and QRF — whether content smoothly closes subsequent steps without a new search.

How to generate follow-ups and implicit sub-queries (operationally)
The follow-up layer is one you won't discover with classic keyword research, because it concerns unspoken questions. Four proven sources:
- Brainstorm after head query. After each core query, list 4–8 logical continuations: "and which for that…," "does it have…," "what are the downsides," "where to check."
- People Also Ask and related searches. Traditional SERP features, but their logic — anticipating the next question — is exactly what fan-out describes (EP4713828A1).
- Logs and candidate refinement queries. Analyze your own internal search logs and Search Console for sequences and reformulations (US20230127706A1).
- Conversational simulation in LLM. Have a conversation in ChatGPT/Gemini around the topic and note supplementary questions the model itself suggests — this approximates hidden sub-queries.

Key patents and their practical consequences
Five patents form the skeleton of this methodology. For each: mechanism and specific operational decision for monitoring.
EP4121843B1
Query Reformulation Fit, embedding similarity, learning from query logs. → AI selects content by meaning, not by literal phrase — monitor intents, not keywords.
US20230127706A1
analysis of query logs and candidate refinement queries. → Subsequent session steps are predictable — build follow-up sequences, not individual terms.
US20250355958A1
implicit sub-queries, semantic aggregation, entity relationships, Answer Completeness. → Coverage of unspoken questions matters — measure completeness, not position.
US20240362093A1
Query Reformulation Fit, reformulated query matching. → Content must match reformulations — test variants of the same intent.
EP4713828A1
query fan-out, dynamic generation of sub-queries in conversational mode. → One query = multiple internal sub-queries — cover the entire fan, not the core.

GEO audit checklist: answer surface coverage
- ☐ For each priority cluster, do I have a complete set of 6 prompt types (head, long-tail, comparison, shopping, local, follow-up)?
- ☐ Does the set contain follow-up sequences, not just isolated queries?
- ☐ Have I identified implicit sub-queries — questions the user won't ask explicitly, but are part of the intent?
- ☐ Is the content structured (FAQ, question headings, internal linking) so AI can semantically aggregate it?
- ☐ Does the cluster have topical depth and E-E-A-T signals allowing it to be considered a definitive source?
- ☐ Do I measure answer completeness (Answer Completeness), not just the fact of appearance?
- ☐ Do I test variants of reformulations of the same intent (Query Reformulation Fit)?
- ☐ Do I use PAA, logs, and LLM simulation to regularly expand the prompt space?

Tensions and gaps worth not glossing over
Methodological honesty: patents describe how the system works, but don't specify operationally how to build a perfectly representative set of prompts or exactly how Google measures Answer Completeness and QRF in ranking. Set-building and content audit methodologies are therefore operational hypotheses based on interpretation of mechanisms — solid, but requiring empirical verification in the changing AI Search environment. Treat your prompt set as a living artifact: test, measure citability, iterate.

Strategic conclusions
The shift from keywords to prompts is not cosmetic tooling, but a change in the unit of measurement. You stop asking "what position am I in" and start asking "do I close the intent well enough that AI doesn't need anyone else." The winner is not the one with the most phrases in the top 10, but the one who covers the fullest, most coherent prompt space around the topic — including questions no one has asked yet.
First move this week: choose one key cluster, build for it a complete set of 6 prompt types with at least one full follow-up sequence, then manually check in AI Overviews / Perplexity how many branches cite your content. That's your zero point.

Discussion questions
- How many prompt types do you actually monitor today — and which do you cover most poorly?
- How do you measure Answer Completeness in practice, since there's no official metric for it?
- In your industry, do follow-ups or comparison prompts determine citability in AIO?