← Back to blog

How to prepare tables for AI Overviews - semantic HTML for AI Search

Rafał Borowiec

• June 12, 2026 • 9 min read

Visual guide illustrating optimal table preparation for AI Overview and AI Mode integration.

Your best comparison table - the one with prices, specs, test results - is probably invisible to AI Overviews today. Not because the data is weak. Because you dropped it in as a graphic, an Excel screenshot, or a PDF. And Google patents say it plainly: the generative engine doesn't "look at" your table - it parses it. And if there's nothing to parse, there's nothing to quote.

An illustration depicting structured data as a key unlocking an 'AI' door, while unstructured data is excluded.

TL;DR

Three Google patents (US8812435B1, US11249993B2, US11714841B2) describe the same pattern: AI Search extracts facts from semantic HTML structure - attribute-value pairs in tables and lists - not from image pixels. Data in graphics and PDFs requires costly OCR and is prone to errors, so realistically drops out of the quotability race. If you want your numbers to land in AI Overviews and AI Mode, migrate them to clean <table> tables with atomic cells and descriptive headers. It's the technically cheapest change with the highest ceiling of effect in GEO.

Illustration of two identical content boxes; one with clear, standardized, machine-readable labels, one without.

The problem: content format has become a ranking factor

For two decades, SEO taught us that what counts is content "readable to humans". In the generative era, that assumption breaks. AI Overviews (AIO) and AI Mode don't link to a page the user will read later - they build an answer from fragments they can extract from your page. And "extract" means something very specific: identify an entity, match it to an attribute, retrieve a value.

This shifts the weight from what you wrote to how technically you encoded it. The same data set - five products, four parameters - in one version is quotable in a second, in another it practically doesn't exist. The difference isn't quality. It's the markup layer.

A mechanical data distillery processing raw information through three distinct stages into structured knowledge.

What really happens under the hood - three patents, three stages

The tabular data extraction mechanism can be broken down into three consecutive stages. Each has separate patent coverage and each places a different requirement on your HTML.

Stage 1 - Importer extracts attribute-value pairs (US8812435B1, 2014)

The oldest but foundational patent. It describes "importers" that analyze an HTML document, recognize a title pattern (a repeating title pattern, e.g., product page header) and a contextual pattern (a repeating markup pattern, e.g., the sequence <tr><td>…</td><td>…</td></tr>). From documents matching both patterns, the system extracts object name and associated facts, where each fact is at minimum an attribute + value. These records land in a knowledge repository - the foundation of the Knowledge Graph.

Practical consequence: if you present "250 PLN" without a label, the system has nothing to build a pair from. "Price: 250" - yes. Just "250" - no. And if you code data once as a table and once as a <div> with <span>, you break the contextual pattern - the importer loses the repeating pattern it was trying to recognize.

Stage 2 - Content processor looks for structure in TOP results (US11249993B2, 2022)

This patent moves the mechanism closer to the generative response. The "structured content processor" identifies structured content in a highly-ranking subset of resources, checks whether the query matches the attributes of that set, selects the matched set, and generates a structured fact set from it - i.e., facts from those attributes that matched the query terms.

Two takeaways. First - ranking still matters: the system looks first at top resources, so structure doesn't exempt you from being visible. Second - this is direct justification for the Structured Fact Availability signal: the easier it is for the machine to identify and match a fact, the greater the chance that your page will feed the answer. <th> headers must be precise and unique here, because that's where the query-to-attribute matching happens.

Stage 3 - Grid range: the system calculates on your table (US11714841B2, 2023)

The strongest argument for HTML versus image. The patent describes a pipeline that parses a natural language query, locates a grid range (cell range - column, row, or rectangular block) relevant to the query in the table, builds a table summary from the characteristics of that range, and then - and this is key - translates the logical operation into an executable formula on that table and executes it to return a result.

In other words: for the query "average price in 2024" the system can locate the "Year" = 2024 column and the "Price" column, then calculate AVG directly on your cells. This is infeasible on an image or in a PDF without costly, unreliable OCR. That's why numerical data must be uniform in format - just numbers, without "PLN", without "zł", without "%" pasted into the cell - otherwise the formula won't execute.

A stable foundation juxtaposed with a tested, precarious footbridge spanning an abyss of uncertainty.

Where the evidence ends - tensions and gaps

Methodological honesty: the three patents above have high confidence (HIGH), but describe general AI Search extraction mechanisms - they're not literally patents "about AI Overviews". The only patent directly touching comparisons in AIO (US20250348925A1, 2025) suggests that markup like ComparisonTable / Table facilitates grounding and retrieval - but has LOW confidence. Treat it as contextual reinforcement, not as a foundation.

Strategic conclusion: the foundation is a semantically correct HTML table (HIGH signals). Dedicated microformats for comparison tables are an additional layer whose real impact on AIO quotability still requires empirical testing. Build on what's certain; add schema.org markup as a bonus, not as the core of the strategy.

Transparent, perfectly fitted data puzzle, where each piece is a semantically defined, algorithm-ready cell.

Key patents and GEO signals (cheat sheet)

US8812435B1 (2014) → Structured Fact Availability signal. Importers extract attribute-value pairs from HTML by title and markup patterns. Requirement: consistent, repeating structure + attribute labels.
US11249993B2 (2022) → Structured Fact Availability signal. Content processor looks for structure in top results and matches query to attributes. Requirement: precise, unique <th> headers.
US11714841B2 (2023) → Passage Extractability signal. Grid range + table summary + executable formula on table. Requirement: atomic cells, uniform numbers, minimal merging.
US20250348925A1 (2025, LOW) → Consensus Corroboration signal. ComparisonTable/Table markup as possible reinforcement for comparisons in AIO. Contextual layer, not foundation.

An audit visualization depicting precise arrangement and verification of perfectly fitted blocks forming a cohesive table structure.

Audit checklist - 8 points to check today

Are all material tabular data in semantic HTML (<table>, <thead>, <th>, <tbody>, <tr>, <td>), not as an image, PDF, or styled <div>?
Are column headers (<th>) precise, descriptive, and unique - so that a query can be unambiguously matched to an attribute?
Does each cell (<td>) contain an atomic unit of information - one fact, one value - without concatenating multiple data points?
Are numerical data uniform in format (just numbers, without currency symbols, units, and % signs in the cell), so the system can perform logical operations?
Is the structure flat and readable, with minimal use of colspan/rowspan that obscures grid range boundaries?
Are attribute labels consistent across the site (always "Weight", not sometimes "Weight", sometimes "Mass")?
Do fact blocks have a repeating, predictable markup pattern (e.g., always <tr><td>Attribute</td><td>Value</td></tr>)?
Do page titles with data (H1, <title>) maintain a consistent schema, facilitating association of facts with the object (name)?

A visualization of an implementation plan as a path of gradual data evolution, enhancing its value and AI readiness.

Implementation plan - from cheapest to most technical

Copy & format (cheap). Clean cells of numerical symbols and units, unify attribute labels, add missing headers.
Markup refactor (medium). Replace styled <div>/CSS Grid with real <table> with <thead>/<tbody>. Flatten merged cells.
Format conversion (larger effort). Migrate data trapped in graphics, screenshots, and PDFs to native HTML tables - this is where most "dead" data hides.
Markup as bonus (optional). For comparison tables, consider schema.org/Table; for advanced - expose aggregates/metadata (JSON-LD) to facilitate grid range mapping.

An empty data facade, depicting a lack of substantial information for quotability.

Three myths costing you quotability

Myth 1: "AI has great OCR, so it will recognize a table from an image." Fact: patents describe extraction based on HTML markup patterns, not on pixel recognition. OCR is an additional, costly, and unreliable step - data in graphics is at best a second-class citizen, and most often simply drops out of the process.

Myth 2: "It's enough that a table looks like a table - CSS Grid will do the job." Fact: appearance is not semantics. Without real <table>/<th>/<td> the importer has no pattern to recognize. CSS Grid or Flexbox without proper markup is just a bunch of <div>s to the machine.

Myth 3: "Every table on the page will be considered anyway." Fact: US11714841B2 talks about identifying a specific grid range relevant to the query - the system selects and analyzes only relevant fragments. Without clear structure and context, your table won't even be considered as a candidate.

A two-stage content sifting process: first filtered by ranking, then selected by structure for AI extraction.

Does adding tables actually increase visibility in AIO and AI Mode?

Short answer: yes, but through a different mechanism than most assume. A table is not a "ranking boost" that elevates your position in the classic ten blue links. A table increases extractability and quotability - i.e., the probability that your fragment will be selected as the source of the generated answer. In the world of AIO and AI Mode, these are two different axes, and it's the latter that determines visibility.

Ranking is the entry ticket, extraction is the win

US11249993B2 shows the sequence: the system first takes a highly-ranking subset of resources, and only then looks for structured content within it to build the answer. This means a table won't replace visibility - you must first be in the top resources. But once you're there, structure determines whether you'll be quoted or just scrolled past. Two pages at similar positions aren't equal: the one with a clean table gives the system a ready-made structured fact set, the other forces it to guess - and loses.

Table as an answer unit, not decoration

US11714841B2 goes further: the system can extract a single grid range from a table and calculate a result on it. This opens visibility to queries you don't even cover with text - "which option is cheapest", "average of column X", "compare A and B". If the data sits in a semantic table, your page can feed an answer to a question you literally didn't write. If it sits in an image - those queries are unavailable to you.

Where to draw the line of honesty

It must be said plainly: the Structured Fact Availability and Passage Extractability signals have strong patent coverage but low confirmed behavioral confidence (STRONG_INFERENCE / behavioralConfidence LOW). Patents prove that the mechanism exists and how it works - they don't prove that Google uses it in production exactly in this form and with this weight. So the cautious thesis is: tables increase the probability of being extracted and quoted, not guarantee a visibility jump. This is a change with a high ceiling and low cost - but requires validation on your own AIO data, not faith on a word.

An X-ray visualization of data, revealing its underlying structure and strategic insights for a summary.

Strategic summary

Quotability in AI Search isn't a matter of more content - it's a matter of content machine-readable. Google patents are surprisingly consistent on this: from attribute-value pair importers from 2014, through the structured content processor from 2022, to executing formulas on grid range from 2023. All lead to the same recommendation - semantic HTML table outclasses image and PDF, because only it allows the machine to identify, match, and calculate your data.

The best part is that this is one of the cheapest moves in all of GEO. It doesn't require a budget for links or months for content. It requires a markup audit - and the decision to stop hiding your best data in pixels.

Crucial data, like a valuable treasure, trapped behind a transparent, impenetrable barrier for AI algorithms.

What about you?

How much of your key data - price lists, specifications, comparisons - still sits in graphics and PDFs, invisible to AI Overviews? Check one product page today and write in the comments what you found. I'd be happy to discuss the boundaries of what patents actually confirm versus what is still a hypothesis to test.

Frequently asked questions

1. Why don't my comparison tables appear in AI Overviews?

AI Overviews extract data from semantic HTML structure, not from images or PDFs. If your table is a graphic, an Excel screenshot, or a PDF document, Google's system cannot parse it and extract attribute-value pairs. You need to migrate the data into clean HTML tables with <table>, <th>, <td> elements to make them quotable for AI Search.

2. What errors in HTML table structure block data extraction by Google?

The most common errors are: lack of descriptive <th> headers, cells containing multiple data points instead of atomic values, numbers with currency symbols or units pasted directly into the cell, and overuse of colspan/rowspan that obscures grid range boundaries. The system must be able to unambiguously match a query to an attribute and perform logical operations on clean numbers.

3. Is structured data (schema.org) for tables enough instead of proper HTML?

No. The foundation is a semantically correct HTML table - that has high implementation confidence in Google (patents US8812435B1, US11249993B2, US11714841B2). Schema.org/Table microformats for comparison tables are an additional contextual layer with uncertain impact (patent US20250348925A1 has low confidence). Build on clean HTML, and add markup as optional reinforcement.