Google's official guide to optimization for AI Search is clinically correct. And operationally incomplete. Google published it on May 15, 2026 as a calm message to the market: do not invent a separate SEO discipline for AI, classic SEO is enough.
The problem is that the document says what does not need to be done - but it does not show exactly how the system makes decisions. In fact, it cannot say that, because this is a market-facing communication note, not an engineering document. And it is precisely in those engineering details that the interesting parts sit: the ones showing how small omissions can make a very large difference.
The body of Google patents describing mechanisms behind AI Search is made of hundreds of documents - some published as early as 2005. They describe exactly the layers the official guide is silent about: a three-branch decision logic for conflicting sources (US20250245255A1), a minimum corroboration threshold of “more than one document” (US9785686B2), structured data as an LLM adaptation signal (US20260080866A1), and passage ranking as a separate scoring system (US9690847B2).
TL;DR: Google is right that you do not need to invent a separate SEO for AI. The patents add, however, that AI Search works through additional thresholds: source selection, fact corroboration and passage scoring. Classic SEO is the entry ticket - visibility in AI Overview requires something more.
Over the last few days I analyzed Google patents through one lens: what the system actually does with content, where the official documentation stays silent. The result is a map of more than thirty patents arranged into process layers - from user intent, through retrieval, to scoring and factual verification.
Contrast map
The simplest way to read this text is as a descent into the Google Search technology stack. The input layer is intent and query fan-out. The access layer is retrieval, RAG and source cutoff. The verification layer is cross-confirmation, or corroboration. The extraction layer is passage ranking, answer score and the choice of a fragment suitable for an answer. The order of sections still follows Google's official document, but the technical vector is deeper: from a market-facing message to the decision mechanisms underneath.
The eight rows below are the backbone of the entire text and also the reading guide. The left column shows the language of Google's official guide. The right column shows the mechanisms patents name explicitly. Each row returns later as a separate part of the argument.
| What Google emphasizes | What the patents add |
|---|---|
| “Review the specific information” - a black box. | A three-branch decision switch - claims 6/12/18 of US20250245255A1. |
| “First-hand review, unique perspective” - content quality as an abstraction. | Information Gain + Factual Grounding Score + Evidence Base Quality. |
| “Don't seek inauthentic mentions” - a negation. | Minimum 2 independent sources = necessary corroboration condition (US9785686B2 claim 1). |
| “No chunking required” - relief for SEOs. | Passage Ranking has had its own scoring since 2009 (US20090024606A1, US9690847B2). |
| “Structured data not required” - relief. | “Structured Data for Adaptation” - a signal configuring an LLM (US20260080866A1). |
| “Query fan-out” as a new AI technique. | Query augmentation since 2005 (US6941293). |
| “Rooted in core ranking systems” - vaguely. | Topical Authority calculated contextually: topic x POV x location x time (US20110246484A1). |
| No mention of Featured Snippets. | Answer Score mechanism: Featured Snippets = AI Overviews (US10783156B1 -> US20230342411A1). |
These eight rows are not a polemic. They are an inventory. Each row has a patent, a date and a claim, meaning a patent claim: a precisely formulated scope of the technical solution the patent protects. That is why a claim carries more weight here than loose interpretation.
What Google published, and what it did not name
Google presents a list of things that do not need to be done: no llms.txt is needed, no chunking is needed, there is no special schema.org markup for AI Search, it is not worth seeking artificial mentions, and there is no need to write for every long-tail variant because AI systems understand synonyms and general meanings.
Each of these statements is true in isolation. Together, however, they create a very specific picture: do nothing extra, the system will handle it.
This exact formula - what does not need to be done - creates the risk of misinterpretation. The phrase “you do not need to do this” does not mean “the system does not use this.” This difference is crucial for SEO strategy, because Google patents describe exactly the mechanisms the official text omits, simplifies or frames negatively - and they name them explicitly.
In the full glossary in section 7 I collect these names in one place. Here, three examples are enough: Confidence Level, Cutoff Prompt and Heading Vector. Each has a patent source and describes a concrete decision the system can make about your content. First, however, let us go through the mechanisms that most strongly change the interpretation of the official guide.
From this point, we go through the key sections of Google's document. I read each in the same structure: first what Google says in the official guide, then what the patents add, and finally what this changes for SEO strategy.
RAG: how Google selects sources and when it refuses to answer
Let us start with the foundation: how the system processes sources before it evaluates anything.
In the official guide, Google describes RAG in three words: retrieve, review, generate. The system retrieves relevant pages, reviews specific information and generates a more helpful answer for the user.
For a publisher, this sounds reassuring. If the page is good, Google will find it, “review” it and use it in the answer. The problem is that the word “review” covers the most important part of the process. Between retrieved sources and a generated answer there is not one simple operation, but a series of decisions.
The patents show this hidden layer. Between “retrieve” and “generate” there is a four-stage pipeline with separate source-selection scoring, a three-branch decision logic for divergent sources, the possibility of fragment fusion and a mechanism in which the system refuses an answer if confidence is too low.
The pipeline has four stages, not two
Patent US20240346256A1 (claim 20) describes the architecture. The query first passes through a neural encoder based on BERT or GPT architecture and is encoded into a low-dimensional dense feature vector. The vector is used for retrieval - cosine comparison with document vectors in the corpus, optionally using Dense Passage Retrieval.
Then the system builds an augmented_prompt, combining the original query with retrieved fragments, and passes it to an LLM with an instruction to prioritize retrieved sources over training knowledge. At the end - and this phase is omitted by Google's document - the pipeline performs post-generation: normalizing text, segmenting it, matching fragments to sources and conditionally adding attribution links.
This final phase - conditional source attribution - is described by two patents from 2026 (US20260064780A1, WO2026049929A1) under the name dynamic attribution. Attribution is not automatic: it depends on how closely the generated answer fragment matches the retrieved source fragment. Your content may therefore be cited - or not cited - even if it fed retrieval.
Grounding, meaning grounding the answer in sources, is a process for reducing LLM hallucinations. The model is not merely supposed to “answer nicely.” It is supposed to answer based on fragments that passed retrieval, selection and later matching to the generated content.
Why does this matter? Because even if your page is in the TOP 10, that is only the beginning of the game. Below I describe the second filter that decides whether the content actually feeds the AI answer.
Source selection has a cutoff mechanism separate from classic ranking
Here comes the first strong proof that being in the TOP 10 SERP is only the first threshold. Patent US20250200100A1 describes a two-step process for selecting sources for grounding.
Step one: classic retrieval returns an initial set with two metrics - relevancy_score and depth_score.
Step two: the system decides which of those documents actually enter answer grounding. And it does this by one of two methods.
- Method A - Generative Cutoff. The system constructs a cutoff_prompt containing the original query and a list of candidates with scores, sends it to a generative model, and the model returns a subset of documents deemed suitable for grounding (claim 2).
- Method B - Probabilistic Cutoff. Relevance and depth scores are passed through a probabilistic function returning a threshold value (claim 4).
This is separate scoring. The patent says it directly: being in the top classic ranking results qualifies you for the second filter, not directly for AI Overview. This patent belongs to the access layer: it does not answer “which document is best in SERP?” but “which document may be allowed into answer grounding?” For SEO, the conclusion is simple: classic ranking is entry into the game, but the game itself has additional rules.
Once it is known which documents entered grounding, the system must decide what to do with them - and this is not one simple operation.
Three-branch decision logic - claims 6, 12, 18
Patent US20250245255A1 - “Neuro-symbolic retrieval augmented generation hybrid” - is one of the strongest pieces of evidence for how far Google has moved beyond the phrase “review information.”
The system first generates at least two preliminary answers (first_preliminary_query_response and second_preliminary_query_response) from different knowledge bases. Each receives a quality_level - a probability assessment based on historical accuracy of similar answers.
Then the decision logic enters - and it is written in three claims: 6, 12 and 18. In simplified form:
if both_quality < threshold:
integrate_knowledge_bases() -> regenerate_answer()
elif one_below_one_above:
select_higher_quality()
else: # both different, but acceptable
if hybrid(fragments) > max(quality_A, quality_B):
return fused_fragments
else:
return higher_quality
This is not review. It is a switch case with three decision branches - with fusion of answer fragments from different bases if the hybrid exceeds the quality of each component, and with regeneration after knowledge-base integration if both initial answers were weak.
Google systems review specific information from retrieved pages to generate a more reliable and helpful answer.
Three patent claims (6, 12, 18) with conditional decision logic, fragment fusion and answer regeneration after knowledge-base integration. This is not review. It is a switch case. US20250245255A1.
What this changes: if the system can fuse answers, regenerate them and choose the higher quality_level, optimization does not end at “write a good page.” Content must be simultaneously relevant, verifiable and sufficiently consistent with other sources to survive further decision layers.
But choosing between answers assumes the system has some quality reference point. If it can choose, it must also evaluate which facts it can trust. Here an older but fundamental layer enters: Confidence Level.
Confidence Level and Weighting Process from 2010
Patent US7788293B2 (priority around 2007, publication around 2010) describes a fact scoring mechanism that powers today's AI Overviews. In simple terms: the system does not assume a fact is true just because it found it on one page. First it checks how many sources say the same thing, how trustworthy they are, and how to resolve conflict when sources provide different values.
The five stages are as follows:
- Collection - facts from multiple sources.
- Clustering - grouping around the same attribute of the same entity.
- Confidence Level - a level of certainty that a given fact is true; high when values agree, low when they conflict.
- Weighting Process - weighting sources during conflict: whom to trust more when different documents provide different information.
- Presentation - high-confidence facts displayed, low-confidence facts hidden.
This is from 2010. A patent with a 16-year history. It is also worth looking not only at the publication date but also the patent priority date, because that shows when a given technical line was actually being developed. The official Google document from May 15, 2026 does not name Confidence Level directly.
And if the confidence level is too low, the system does not have to choose the “least bad” answer. It can make a move the official guide does not highlight: refuse to generate the result.
The system can refuse to answer - and this does not appear in Google's document even once
Patent US20240428015A1 defines three concepts the official document does not name:
- Self-evaluation Score - the model evaluates its own confidence in the answer.
- Selective Prediction - the model's ability not to answer.
- Abstention - the system state in which it decides not to return an answer because confidence is low.
Patent US20230342411A1 claim 2 adds a binary decision: if accuracy_score is greater than threshold - display; if not - do_not_display. This is not a cosmetic detail. It changes how we think about visibility: the system does not only choose the best answer, but may decide not to show any answer.
Patent US7953720B1 describes two-stage verification: (1) supported_score > minimum_threshold; (2) supported_score significantly higher than contradictory_score. If either condition is not met - the system refuses.
Google cannot afford hallucinations in AI Overviews. That is why selective prediction and abstention, meaning refusal to answer, act as a safety valve. For SEO, this means the fight is not only to be “better” than the competitor. The fight is also to prevent the system from escaping into abstention because your data is not precise, consistent or confirmed enough.
The system can refuse. AI Overview does not always appear - and that is not an error or lack of data. It is a designed function: if no candidate answer crosses the quality and confidence threshold, the system refuses. What does this mean for SEO? Success is not being in the results. Success is crossing the threshold function Google keeps in patents, not in documentation.
And system confidence does not come from nowhere. If the system can refuse an answer, we must ask what builds its confidence in the first place. Here begins corroboration - and here the official Google document tells only half the truth about mentions. Data confirmation is exactly the element the official guide treats as anti-manipulation guidance, while patents treat it as a decision mechanism.
Mentions: spam versus real fact confirmation
In the mentions section, Google begins with a warning: do not seek artificial mentions, because they are not as helpful as they might seem, and anti-spam systems can protect results from such manipulation attempts.
This sentence is true and needed. It does not imply, however, that mentions do not matter. It implies only that artificial mentions are a bad signal. Google patents show the other half of the picture: natural, independent confirmations of facts are one of the foundations of cross-confirmation, or corroboration.
That is why this section is not an encouragement to “do mentions.” It is a distinction between two things that are easy to throw into one bucket: spam mentions and natural confirmation of information by independent sources. It is worth giving the floor to Google itself. In a 2015 patent, engineers formulated the problem this document solves:
“What is needed is a method for finding sources relevant to a topic so that information related to that topic can be reliably confirmed or rejected.” - US9208229B2 | 2015
This is a quote from a Google patent itself, not from my analysis. The patent is called “Anchor text summarization for corroboration” and defines a two-stage logic: (1) topically relevant sources -> (2) confirmation or rejection of information. Topical Authority x Corroboration. Google describes this as a “need” - meaning a problem the patent solves.
Meanwhile, Google's official document from May 15, 2026 says, in short: artificial mentions are spam and are blocked by anti-spam systems. That is true. But the document does not state the second part directly: that natural mentions are a hard, measurable signal in the RAG and Knowledge Graph pipeline.
Minimum corroboration = more than one electronic document. Verbatim claim.
Patent US9785686B2 - “Corroborating facts in electronic documents” - describes the fact verification procedure: the system defines a factual query, creates a hypothetical_fact, meaning a potential answer, and then checks how many documents in the corpus support it.
- Define the factual query.
- Create hypothetical_fact - a potential answer.
- Check how many documents in the corpus support it.
- Minimum threshold: the number of supporting documents must exceed one.
Verbatim - what appears in claims 1, 11 and 21 of this patent: the threshold for treating a fact as likely correct must be “more than one electronic document.” In other words: minimum 2 independent sources = necessary condition for corroboration.
For the Polish publishing market, the “more than one electronic document” rule is a turning point. The Polish web has a lower density of independent sources than the English-language market, so the algorithm has a harder task obtaining a high Confidence Level. It works both ways: facts are harder to confirm in niches, but it is easier to build a strong topical position if key facts about a brand, product or methodology are consistently confirmed by independent industry, local and regional domains.
Do not seek artificial mentions, because they are not as helpful as they may seem.
Claim 1 requires at least two electronic documents to confirm a fact. Google does not say: mentions do not count. Google says: artificial mentions do not count. The patent requires natural ones to count. And it is measurable. US9785686B2 claim 1.
What this changes: visibility strategy in AI Search should not be about buying mentions. It should be about building a presence in which key facts about the brand, person, product or methodology are consistently confirmed in independent, topically relevant sources.
Corroboration Janitor: incoming link anchor text as a fact-confirming signal
Patent US9208229B2 - the same one the epigraph comes from - introduces a component called Corroboration Janitor. The mechanism is simple: anchor text of links pointing to a document from other pages serves as an external signal confirming facts contained in the target document. This connects content analysis with link graph analysis.
Fact confirmation alone is not enough. Google must still know whether a source has authority in exactly the topic, place and context in which the user asks the question.
Topical Authority is contextual, not general
Patent US20110246484A1 - “Authority ranking” - defines authority as a function:
authority_value(source, topic, location, point_of_view, time)
There is no single domain authority. There is specific authority - per:
- Topic (SEO != cardiology - authority does not transfer).
- Location (local != global).
- Point of view (content for doctors != for patients).
- Time (authority evolves).
Input signals named in the patent: number of publications associated with the source in a given topic, number of links to related content, number of articles citing related work.
Once authority is split into topic, location, point of view and time, the next question is: how does the system recognize a source that regularly deserves trust?
Authoritative source = high ranking + frequently selected
Patent US9448992B2 defines an authoritative source in two ways:
- A source consistently ranking high in search results for queries from a given domain.
- A source frequently selected by users in search results.
What this means: authority is a derivative of classic ranking and user behavior. Being regularly high + being regularly selected = authority. There is no separate authority signal fully independent from SERP positions.
This logic does not stop at document ranking. In newer patents, authoritative publications begin to play a deeper role: they become material for model calibration.
Authoritative publications -> golden prompts
Patent US20250077776A1 introduces the concept of golden prompts: authoritative publications (manuals, scientific articles, expert guides) are used to generate prompts that ensure high LLM accuracy. They are filtered and weighted by a generator_publication_selector module based on ranking and credibility.
Operationally: your authoritative publication may directly shape what an LLM learns during fine-tuning. This happens deeper than merely retrieving documents. This is model calibration level.
At this point we return to the practical SEO question: does confirmation have to take the form of a classic link? Patents show it does not always have to.
Mentions without hyperlinks - measured by three signals
What Google calls inauthentic mentions and discourages is the artificial version of something that exists in patents as three measurable mechanisms. A mention without a hyperlink works because the system does not have to follow a link to notice a relationship. The three mechanisms described in patents measure exactly that relationship:
| Patent | Signal | What it does |
|---|---|---|
| US9727617B1 | Entity-Quote Relationship Score | Quotes connected to entities - popularity, freshness, strength of relationship. |
| US9323721B1 | Quotation Consensus Score | Frequency and form of a quote across many different high-quality sources. |
| US9201876B1 | Co-occurrence Consistency Score | Consistency of co-occurrence of words and entities across documents. |
A mention of your brand in an industry article - even without a link - may therefore function as a signal. Not because a “mention” is a magic shortcut to AI Overview, but because the systems described in patents measure entity relationships, quotation consensus and co-occurrence consistency. Three times.
Knowledge Graph: popularity as a truth signal
Patent US20150317367A1 closes the loop. Information popularity - the number of sources confirming it - is an important truth signal used in the automated construction of the Knowledge Graph. This is not spam. This is mathematics. The more independent sources provide the same fact, the higher the probability it enters the structure from which Google generates answers.
Once the system knows which fact is reliable, it still has to choose the page fragment that expresses it best. Here Google is silent about the most important technological bridge: Featured Snippets and AI Overviews are the same scoring direction, only at a different scale.
Featured Snippets and AI Overviews - the same technology line
Answer scoring is the technological bridge between Featured Snippets and AI Overviews - and the best proof that AI Search is not a new planet, but another floor in an old building.
Patents let us name this technological continuity: the research publication “Corroboration of Web Answers,” Featured Snippets patents, the answer_score mechanism and later AI Overviews patents all describe the same direction - answer scoring, not only document ranking.
Passage Ranking as the cornerstone - since 2017
Patent US20170011116A1 - “Generating elements of answer-seeking queries and elements of answers” - introduces the transition from evaluating whole documents to evaluating individual fragments (passages). This is the technological cornerstone that enabled both Featured Snippets and today's RAG / AI Overviews.
Passage-specific signals described in the patent:
- Structured Answer Element Density.
- Query-Answer Element Alignment.
- Passage Coherence and Completeness.
- Document Answer Authority.
These signals describe what the system sees in the fragment itself. The next step is to calculate whether that fragment can actually serve as an answer.
Answer Score: the same formula for Featured Snippets and AI Overviews
Patent US10783156B1 (2020) describes the aggregated score of a fragment:
answer_score = f(query_dependent_score, query_independent_score)
Patent US9940367B1 confirms this formula in claim 19. In this patent line, the evolution is visible: an earlier version focused mainly on query_dependent. The newer version explicitly extracts query_independent - features of the fragment itself and its source, independent of query context.
What this means: the answer scoring formula has not changed since Featured Snippets. What changed is only the number of sources used (one -> many) and the type of output (extract -> synthesize).
Multi-source extraction with threshold = the heart of AI Overview
Patent US20230342411A1 (2023). A direct descendant of US10783156B1. With two key additions: multi-source extraction and a display threshold.
Claim 2 directly: if accuracy_score > threshold -> display. If < -> do not display.
Scoring signals in this patent:
- Consensus-Verified Factual Consistency.
- Multi-Source Topical Alignment.
- Passage Semantic Cohesion.
- Source Authority & Trustworthiness.
- Query-Passage Semantic Overlap.
AI Search is rooted in classic ranking systems, but the guide does not lead the reader through the Featured Snippets -> AI Overviews line.
US10783156B1 (2020) and US20230342411A1 (2023) are in the same technology line. Three years apart. Featured Snippet -> AI Overview. The scoring mechanism did not change - only the size of the source context and the type of output.
What this changes: AI Overview is not a detached being from another planet. It is an extension of answer logic: instead of one fragment from one source, the system can synthesize multiple fragments from multiple sources, still relying on answer scoring, source quality and a display threshold.
Summarized Query Response - a conceptual bridge
Patent US12158917B2 (2024) introduces the concept of a summarized query response as a separate entity - distinct from the cited source fragment. Signals: Content Summarizability / Direct Answer Authority / Information Saliency / Speakability/Readability Score.
This is the bridge between a simple snippet and a complex AI Overview. The system does not only cut out a fragment; it creates a separate answer entity that can be evaluated for summarizability, directness, readability and usefulness in a generative result.
Document Portions + Query Embedding Similarity
Patent US20250103826A1 (2025) - one of the most direct patents describing the AI Overview mechanism. It explicitly speaks about generating answers with citations based on document portions and Query Embedding Similarity as a measure of semantic similarity.
Signals: Fragment Granularity/Cohesion / Citation Quality/Source Authority / Multi-Source Integration Potential. AI Overview is passage ranking + multi-source scoring + embedding-based retrieval. Three patents, one mechanism. All publicly available.
In other words: AI Overview does not retrieve the whole page. It retrieves precisely defined fragments - and evaluates their quality separately from the evaluation of the whole document.
Patent timeline - 20 years of continuity
Below is 20 years of continuity. The oldest patent was filed when Google was still a startup. The newest has a 2026 date.
Twenty-one years. Everything public. That is why AI Search is not a revolution - it is an evolution with a fairly well-documented history. And that is exactly why Passage Ranking is not a new AI feature, but an old mechanism with a new interface.
Passage Ranking: the system segments the document - you help it do that
Google rightly advises against manual chunking. In the document it reassures publishers: content does not have to be split into small pieces specifically for AI, because Google's systems can understand the nuances of multiple topics on a page and show users the right fragment.
This is good operational advice. It is not worth artificially cutting articles into “bite-sized chunks,” because this may worsen readability. Patents show, however, that the system divides the document into fragments anyway - it simply does it much better than a human.
Patents show that, on Google's side, the document is still divided, evaluated and matched fragment by fragment. Passage Ranking has had its own scoring since 2009, and the structure of headings, lists and paragraphs helps the system determine where a meaningful passage begins and ends.
Passage uniqueness via shingles - since 2009
Patent US20090024606A1 - “Identifying and Linking Similar Passages in a Digital Text Corpus” - from 2009. Mechanism: shingle-based passage matching, meaning identification of unique word sequences (so-called shingles) to group similar fragments in the corpus. Fifteen years before the “no chunking required” document. A signal called Passage Uniqueness Score has existed since the middle of the previous decade.
A second patent from the same year: US7580921B2 - Phrase-Based Indexing. Identification of complete, meaningful phrases in documents and queries as basic indexing units. An early form of semantic chunking - 17 years ago the system did not operate only on individual words, but on meaningful phrases.
Per-fragment weights with DOM + user engagement signals (2013)
Patent US8538989B1 (2013) introduces the concept that different fragments of the same page may have different weights. The system divides a document into smaller portions and assigns individual weights based on:
- Relevance to the query.
- DOM structure (HTML).
- User interaction data with those fragments.
Signal: Content Portion Relevance. Fragment relevance including DOM + user engagement. The patent is 13 years old.
Selecting content by query segments (2017)
Patent US9690847B2 (2017) - identification of the “topical core” of a query and evaluation of individual content segments independently from the full query. Your page is not evaluated as a monolith. Its segments are evaluated independently.
At this point the most important part for an editor begins: if fragments have separate weights, document structure stops being only UX. It becomes a scoring signal.
Methodological note: in this section I keep only mechanisms that can be anchored in patents. Heading Vector has direct support in US9959315B1, so I describe it in the language of that patent: heading vector, Similarity Score and Answer Score adjustment.
Heading Vector and Similarity Score (US9959315B1)
Patent US9959315B1 describes scoring answer fragments by analyzing headings above a given passage. The system creates a heading vector, compares it with the user's query and uses the similarity score to modify the Answer Score.
In this mechanism, three elements are key:
- Heading Vector - a heading vector, meaning text from one or more headings hierarchically above the candidate answer fragment.
- Similarity Score - a similarity score between the user's query and the text from the heading vector.
- Answer Score Adjustment - a modification of the fragment score if the heading-query similarity crosses a threshold.
From the search engine's point of view, an H2 heading is not just a larger font with a tag in the code. It is an address of context that helps determine which problem a given text fragment belongs to and whether that fragment deserves a higher score as an answer.
Passage Coherence + Completeness
Patent US20170011116A1 introduces four signals:
- Passage Coherence and Completeness - whether the fragment is complete and understandable in isolation.
- Structured Answer Element Density - density of elements answering the question.
- Document Answer Authority - authority of the parent document.
- Answer Type Diversity in Passage - diversity of answer types.
Hybrid model: independent fragment scoring + inherited trust from the parent document.
What this means for content structure
All the mechanisms described above say one thing: the system will segment your document by itself. The only question is whether you make it easier. Instead of manually cutting content, it is enough to take care of natural structure the system can use. In patents, it is clear what it takes into account:
- Paragraph boundaries + headings (Heading Vector -> Similarity Score).
- A coherent H1 -> H2 -> H3 hierarchy that creates context for the fragment.
- Coherence and completeness of the fragment (Passage Coherence).
- Word co-occurrence (US9201876B1).
- Semantic embeddings, meaning numerical representations of text meaning (US20250103826A1).
Conclusion: instead of chunking for AI - structure the document so the system can cut it cleanly. Clear H2/H3, lists where they are natural, one topic per section, atomic paragraphs. This is not chunking. It is ordinary readability. But it has a measurable impact on passage score.
Document structure decides how the system segments it. But segmentation is only half the task - because before the system evaluates fragments, it must decide what it is really looking for.
Query fan-out: query expansion has 20 years of patent history
We already have the scoring and source selection layers. But before the system begins evaluating documents, it must know what it is looking for. And here begins something Google describes as a new AI technique - but which has a patent history going back to 2005.
The system searches for an answer to an entire set of related information needs. Google describes query fan-out, meaning expanding one query into variants and sub-queries, as an AI Search technique: the model generates a set of parallel, related queries to gather more information and additional search results matching the user's need. In the lawn-care example, fan-out adds questions about herbicides and weed removal.
Practically, it works like this: the user types one short question, but the system rewrites it into several hidden auxiliary questions. For “how to prepare a lawn for summer,” it may search separately for fertilizing, watering, mowing, weeds and lawn diseases. The user sees one answer, but retrieval works on an entire package of intents.
For the reader, this sounds like a feature from the generative AI era. In its current form it is indeed connected to generative models. But the problem itself is older: how to expand a user's query so the system finds a fuller answer, even when the user typed only a shorthand version of intent.
Patents show that this line starts at least in 2005. Query fan-out is therefore a new name and a new execution layer for an old mechanism: query expansion, query augmentation and generation of auxiliary queries.
US6941293 (2005) - the oldest fan-out patent
Patent US6941293 (priority 2005) - “Equivalent descriptions for query intent.” Mechanism: determining equivalent descriptions for an information need. A foundation for all later query expansion mechanisms. Twenty years of patent documentation.
Seed queries + structural templates (2013, 2016)
Patents US8346792B1 (2013) and US9436747B1 (2016) introduce:
- Seed queries - seed queries.
- Structural similarity of documents - e.g. HTML tags.
- Query templates - patterns applied to other documents.
US9436747B1 claim 1: identify the structure of the document portion containing seed-query terms, then generate a new query from text in an analogous structure.
In 2013 the system automatically generated sub-queries using HTML as a template. Not an LLM. A deterministic procedure.
Synthetic queries from structured data (2015)
Patent US9128945B1 - generating high-quality synthetic queries and associating them with specific resources or entities. Structured data (Schema.org, microformats) was directly used to generate synthetic queries already 10 years ago.
Trained generative model for query variants (2023)
Patent US11663201B2 - trained generative models for creating query variants. Only in 2023 was the mechanism moved onto LLMs. Earlier (2005-2016) it operated on deterministic rules + structure analysis. Evolution, not revolution.
Why llms.txt is redundant - and what Google does instead
Patent WO2024229375A1 - “Query expansion by prompting large language models” (2024). Use of an LLM to add semantic context or generate sub-queries before sending the query to the search system.
Why does Google say: you do not need llms.txt? Because llms.txt, as imagined by its proponents, is meant to hand models ready, condensed knowledge portions on a tray. Patent WO2024229375A1 shows the reverse process: Google does not wait for the publisher to prepare a separate file for the LLM. Google's own LLM takes the raw user query and adds semantic context or sub-queries before sending the query to the search system.
In short: Google publicly says “you do not need llms.txt,” because an analogous context-ordering mechanism works on its side. Not through your additional file, but through query expansion, query decomposition and earlier extraction of document structure.
The contrast is simple: Google publicly says you do not need new machine-readable files or markups to appear in generative AI Search. The patent shows Google itself uses an LLM to add semantic context to queries before retrieval. Functionally, it is llms.txt - but not as an interface for publishers; as an internal layer on the search engine side.
What this changes: the point is not to create a file for the LLM. The point is to make the document semantically complete enough to survive auxiliary queries, intent variants and sub-queries the user never typed directly.
Query decomposition + session-based augmentation
Fan-out is not the generation of random variants. In the patent corpus, several supporting mechanisms are visible:
- Modeling hidden intents - signals driving expansion when the query is incomplete.
- Session-based augmentation - earlier queries in the session shape the current expansion.
- User model embeddings - location, history, language as input.
- Query decomposition - breaking a complex query into atomic sub-queries.
Fan-out does not generate random variants. It is a deterministic result of the user model, session history and query structure - and this procedure has been documented in patents for years. Patent WO2024229375A1 therefore does not belong to the category “new files for LLM,” but to the category “Google itself adds a semantic layer before retrieval.”
At this point we have all pieces of the puzzle: retrieval, thresholds, corroboration, fragments and query expansion. Before moving to operational conclusions, I collect in one place the terms Google does not use in public communication but which appear in patents. Treat this list as a reference glossary - you can return to it after reading the conclusions.
Glossary: 24 mechanisms Google does not name publicly
In its guide, Google uses language safe for the broad market: good SEO, helpful content, quality systems, no need for new tricks. This is understandable as communication, but technically very general.
Patents use a different language. There we find names of concrete decisions, thresholds and signals the public document omits. You do not have to memorize these terms. Treat the list as a map: if a term appears in your analysis, you know where to look for the source and which layer of the system it describes.
Here are terms you will not find in the official guide, but which appear in Google patents. Each describes a concrete algorithmic decision about your content: Confidence Level, Quality Level, Weighting Process, Cutoff Prompt, Self-evaluation Score, Heading Vector, Similarity Score, Authority Value, Document Portions, Structured Data for Adaptation, Corroboration Janitor, Hypothetical Fact, Likely Correct Fact, Authoritative Publication, Golden Prompt, Query Embedding Similarity, Passage Uniqueness Score, Content Portion Relevance, Information Saliency, Speakability Score.
Each of these terms has a patent source. Each describes a concrete decision the system makes about your content. Each has a date - some from 2007.
System thresholds and decisions
The first group concerns the hardest layer of AI Search: threshold decisions. This is where the system evaluates whether an answer has enough quality, confidence and source basis to be shown at all. And this is where Google's official document is least precise, because it speaks about quality generally, while patents describe specific thresholds.
| # | Term | Definition | Patent | Year |
|---|---|---|---|---|
| 1 | Confidence Level | Measure of fact confidence based on number/quality of sources. | US7788293B2 | ~2010 |
| 2 | Quality Level | Assessment of the full generated answer. | US20250245255A1 | 2025 |
| 3 | Weighting Process | Trust x recency x completeness in conflict. | US7788293B2 | ~2010 |
| 4 | Preliminary Query Response | Initial answer from a specific knowledge base. | US20250245255A1 | 2025 |
| 5 | Cutoff Prompt | Generative AI selects a subset of documents for grounding. | US20250200100A1 | 2025 |
| 6 | Probabilistic Cutoff | Probabilistic function as an alternative selection mode. | US20250200100A1 | 2025 |
| 7 | Self-evaluation Score | Internal model assessment of whether it has enough confidence. | US20240428015A1 | 2024 |
| 8 | Abstention | System decision not to provide an answer. | US20240428015A1 | 2024 |
Authority and corroboration
The second group describes sources of trust. It is not only whether information exists, but who confirms it, in what topic and whether the system sees sufficiently strong consensus. This order matters: after quality thresholds comes the question of what the system uses to build confidence.
| # | Term | Definition | Patent | Year |
|---|---|---|---|---|
| 9 | Authority Value (contextual) | Contextual authority: topic x POV x location x time. | US20110246484A1 | 2011 |
| 10 | Authoritative Source | Highly ranking + often selected for queries. | US9448992B2 | 2016 |
| 11 | Authoritative Publication | Source used to generate golden prompts. | US20250077776A1 | 2025 |
| 12 | Golden Prompt | Prompt from authoritative publication for LLM tuning. | US20250077776A1 | 2025 |
| 13 | Corroboration Janitor | Component using anchor text to confirm facts. | US9208229B2 | 2015 |
| 14 | Hypothetical Fact | Potential answer - candidate for verification. | US9785686B2 | 2017 |
| 15 | Likely Correct Fact | Fact status after crossing the min. 2-document threshold. | US9785686B2 | 2017 |
Document structure and fragments
The third group descends to the level of the document itself. What matters here is how the page is divided, where a passage starts and whether a single fragment can function as an independent answer. This is the natural step after authority: even a trusted source must provide information in a form the system can extract, evaluate and cite.
| # | Term | Definition | Patent | Year |
|---|---|---|---|---|
| 16 | Fact Expansion Module | HTML tags influence the scope of the recognized fact. | US8954412B1 | 2015 |
| 17 | Answer Score | Aggregated score: query_dependent + query_independent. | US10783156B1 | 2020 |
| 18 | Query Independent Score | Evaluation component independent of the query. | US10783156B1 | 2020 |
| 19 | Heading Vector | Text from headings above the answer fragment. | US9959315B1 | 2018 |
| 20 | Similarity Score | Similarity of the query to the heading vector, used to modify the fragment score. | US9959315B1 | 2018 |
| 21 | Passage Uniqueness Score | Uniqueness of a fragment in the whole corpus. | US20090024606A1 | 2009 |
Retrieval, embeddings and model adaptation
The last group connects classic retrieval with the semantic and generative layer. This vocabulary describes how the system selects fragments, compares their meaning with the query and uses structured data to interpret content. That is why it closes the glossary: it shows the place where classic SEO meets the generative model.
| # | Term | Definition | Patent | Year |
|---|---|---|---|---|
| 22 | Document Portions | Fragments retrieved by RAG as context. | US20250103826A1 | 2025 |
| 23 | Query Embedding Similarity | Semantic similarity of query and fragment. | US20250103826A1 | 2025 |
| 24 | Structured Data for Adaptation | Schema.org as a signal configuring the generative model. | US20260080866A1 | 2026 |
Three of these terms deserve special attention.
Confidence Level is a mechanism from 2010 that dictates whether your fact is shown to the user or hidden. In practice, visibility in AI Search starts with data consistency, not merely the quality of one paragraph.
Heading Vector and Similarity Score show that document structure is not only UX. Headings and section boundaries help the system understand where a fragment suitable for an answer starts and ends.
Structured Data for Adaptation is a 2026 patent revealing that schema.org is a signal for generative AI. This is exactly the point at which Google's official document can too easily be simplified into the sentence: “structured data does not matter for AI.”
You now have the map of mechanisms. Time to translate it into decisions.
Operational conclusions
At the end, Google essentially says: keep doing good SEO, create helpful content and do not get pulled into trendy abbreviations like AEO or GEO. It is hard to disagree with this direction.
The patents clarify, however, what “good SEO” means in the AI Search environment. This is not a separate magical discipline or a new label for selling the same service. It is five practical consequences that extend classic SEO with mechanisms described in patent documents. Each conclusion below closes the same loop: Google's public message is correct, but patents add the execution layer.
Conclusion 1: classic ranking is only the entry threshold
Thesis: Google says: do good SEO. The patents clarify: classic ranking is the first access threshold to systems that select sources for AI Overview. The entire narrative about passage ranking and AI Overview is academic if the document does not pass classic ranking.
Anchoring patent: US9690847B2 (selecting content using query-independent scores of query segments) + US20250200100A1 (Cutoff Prompt). Source selection for grounding happens after classic retrieval. Classic ranking = prerequisite, not alternative.
Operationally: stop treating SEO and AI Search (GEO/AEO) as two separate budgets. If the domain does not generate a strong base ranking in the classic index, content does not even reach the layer where the generative model can run the Cutoff Prompt procedure. Classic SEO is the entry ticket. Only after showing that ticket does the algorithm begin evaluating the document's chance of participating in AI Overview.
Do not: treat AEO/GEO as a replacement for classic SEO. It is an extension, not an alternative.
Conclusion 2: facts must be consistent in at least two sources
Thesis: Google says: do not buy artificial mentions. The patents clarify: natural confirmation of facts in more than one source feeds Confidence Level. Visibility in AI Search therefore begins with factual consistency, not prompt tricks.
Anchoring patent: US9785686B2 claim 1: the threshold for treating a fact as likely correct is “more than one electronic document.” Plus US20110246484A1: Authority Value is contextual. Plus US20150317367A1: popularity of information is an important truth signal for the Knowledge Graph.
Operationally: consistency of data (N-A-P, addresses, dates) across the web is the foundation of Confidence Level. Natural mentions in authoritative industry publications increase confidence_level. Citations of your content in independent sources are input for quotation_consensus_score (US9323721B1).
Do not: buy artificial mentions. Google is right. But do not confuse “do not buy artificial mentions” with “mentions do not count.”
Conclusion 3: Schema.org is a signal for LLMs, not only for rich snippets
Thesis: Google says: there is no special schema only for AI Search. The patents clarify: structured data can still help models interpret entities, relationships and attributes. It is an ordered description of material with which AI systems can work more precisely.
Anchoring patent: US20260080866A1 mentions “Structured Data for Adaptation” as a signal configuring a generative model. Plus US9128945B1 (2015): synthetic queries generated directly from structured data. Plus US11475290B2: selection of entity attributes for dynamic knowledge structure building.
Operationally: break the habit of thinking Schema.org is implemented only to get rich results. Patent US20260080866A1 describes structured data as “Structured Data for Adaptation,” a signal helping the generative model better adapt to context. Precisely marked entities + attributes + relationships in Schema act as a technical instruction for interpreting unstructured paragraphs. More simply: Schema.org is the language in which you organize for Google's model who is who, what is what and which relationships matter.
Do not: overload schema with spam (it is still spam and still blocked). Implement precisely.
Conclusion 4: heading structure determines passage boundaries
Google says: do not manually cut content for AI. The patents clarify: the system segments the document anyway, but it does so based on structure, headings, lists and fragment coherence. The document should be easy for the system to understand, segment and cite.
The patent layer here is tied together by US9959315B1 with Heading Vector, Similarity Score and Answer Score modification, as well as US20170011116A1 with Passage Coherence and Completeness signals and US8538989B1 with fragment weights based on DOM structure and user engagement.
Operationally it means a simple thing: the H1 -> H2 -> H3 hierarchy is a context vector for each passage, and one topic per section helps the system determine answer boundaries and match the fragment to the query. So this is not about cutting content into bite-sized chunks for AI. It is about readable structure - a completely different editorial task.
Conclusion 5: visibility in AI Overview is binary
Google says: create helpful and trustworthy content. The patents clarify: helpfulness passes through thresholds of quality, confidence and accuracy_score. An answer either crosses the threshold and appears in AI Overview, or it does not - these are concrete numerical values and system decisions, not a subjective “good impression.”
US20230342411A1 claim 2 says it directly: if accuracy_score > threshold, the system displays the answer; if not, it does not display it. US20240428015A1 adds Selective Prediction, Self-evaluation Score and Abstention, meaning the possibility of refusing an answer even when candidate sources exist.
Operationally, content is evaluated across several threshold axes at once: confidence_level, quality_level, accuracy_score. Do not calibrate content for AI abstractly. Calibrate it against concrete patent criteria: factual_grounding_score, evidence_base_quality and information_verifiability.
The five conclusions above are not a new discipline. They are classic SEO with the operating instructions read more carefully - described in public documents with dates and claim numbers.
Closing. The document presented by Google is clinically correct, but operationally incomplete
Google published a document that is clinically correct and operationally incomplete. This is an understandable choice: its function is market communication, not an engineering manual. It says what does not need to be done, and it is right. But it does not say what the system actually does with your content after it finds it.
AEO and GEO as separate disciplines are largely marketing inventions - selling a new label for the same service. But “do good SEO” without knowing the mechanisms is advice that only works if you know what exactly the quality, confidence and accuracy_score thresholds measure.
The patents name these mechanisms. From US6941293 (2005) to US20260080866A1 (2026). Twenty years. All public. All with claims.
AI Search is not a new planet. It is another floor of a very old building - with an elevator that stops only on floors classic ranking already knows.

