← Back to blog

Patent US8886662B1: Shared suffix analysis

Rafał Borowiec

• June 19, 2026 • 8 min read

Stylized title card for 'Shared suffix analysis' report.

A practical guide for SEO professionals - how Google builds long-tail suggestions and what to do about it

"Generating word completions based on shared suffix analysis" · Google LLC · priority 2011-07-11 · granted 2014-11-11 Inventors: Lev Finkelstein, Eddo Kim, Ari Shotland

Clear, multi-layered architectural structure elegantly presenting organized sections of knowledge and tools.

What you'll find inside

TL;DR - 60 seconds and you'll know if this is for you
Before you implement: confidence level (read carefully)
What the patent actually does - the mechanism in 7 steps
Three non-obvious conclusions (not found in the patent)
Who wins, who loses
Action plan - from cheapest to most advanced
Ranking signals and how to measure them realistically
Ready-to-copy artefacts (JSON-LD, GA4, audit prompt)
Audit questions - run these on your client's account
When to walk away - STOP signals

Visualizing the discovery of hidden value at the very end of a long data chain.

1. TL;DR - 60 seconds

What the patent describes: Google's autocomplete engine, which selects query completions not by prefix but by shared suffix - i.e. by the ending of a phrase. Candidates are weighted by popularity from query logs (completion_score).

What it changes for you: not content ranking directly, but which long-tail phrases users formulate in the first place. This shifts the distribution of impressions and CTR towards pages covering popular query endings.

Who wins: e-commerce sites with an extensive product long-tail and FAQ/how-to sites covering frequent suffixes.

Who loses: sites living off unique, rare phrases that don't appear in popularity logs.

One action for today: pull the top suffixes (2-4 terms) from your internal search logs, map the top 20% to existing URLs and identify gaps. It's the cheapest move with the highest return.

Illustration contrasting a solid mechanism foundation with an uncertain implementation platform.

2. Before you implement: confidence level

Honestly, before you spend budget. In this patent I separate two things, because they carry different evidential weight:

Mechanism = HIGH. How the algorithm works is stated directly in the claims. Claim segmentation OK, evidence coverage 95/100, 6 independent claims + 17 dependent claims. I'm not guessing here - I'm reading.
Deployment = LOW. Whether Google actually uses this in production - we don't know. The reconciler flagged a discrepancy: the LLM layer rated deployment as HIGH, the deterministic layer rated it LOW (deployment evidence 2/5, no public confirmation). In such a conflict we defer to deterministics: LOW.
Novelty = INCREMENTAL (36/100). This is an evolution of pre-LLM autocomplete (priority 2011), not a breakthrough. Closest prior art: US7260568B2, US7487145B1, US7836044.

Operational conclusion: treat this as a long-tail coverage strategy grounded in the mechanism, NOT as a confirmed ranking factor. The good news: all recommendations below constitute solid SEO regardless of whether Google deployed this exact algorithm. You're not betting budget on a hypothesis - you're doing things that stand on their own merits.

Multiple seemingly different paths converging towards a single, common goal.

3. What the patent actually does - the mechanism in 7 steps

Input: a partial query (what the user is typing), a set of unique queries from logs, and their popularity. Output: a ranked list of suggestions with an assigned completion_score. In between:

Split into prefix and suffix - tokenisation, with handling of an incomplete final term (e.g. the user typed "bes").
First candidate set - queries from logs whose suffix shares at least 3 common terms, and whose prefix does NOT contain the partial query's prefix as a subsequence.
Fallback - if empty, lower the threshold to 2 shared terms in the suffix.
Suffix similarity score - a measure of suffix alignment, accounting for term order and incomplete-token matches.
Selection and grouping by unique suffixes (de-dup).
Completion score - aggregation: popularity of source queries weighted by suffix similarity, plus normalisation → ranking.
Incomplete token handling + telemetry for further threshold tuning.

The most important, counter-intuitive detail

The system rejects candidates whose prefix contains the user's query prefix. This is the opposite of classic autocomplete. The effect: instead of simply "complete the beginning", Google proactively suggests phrases that start differently but end similarly - steering the user towards richer intent patterns detected in the suffix. That's why the strategy of "push the start of the phrase with a keyword" is ineffective here - it's the ending that counts.

The hard threshold of 3/2 terms is a deliberate choice by Google: precision over the "fuzzy" matching of models. At the UX-critical moment of suggestion, the system prefers a confident, popular suffix over creative guessing.

Abstract representation of a center of gravity, symbolizing key conclusions.

4. Three non-obvious conclusions

Conclusion 1 - Google rewards the suffix, not the prefix

Since candidates sharing the same beginning are actively rejected, optimising for "beginning of phrase + keyword" doesn't get captured by this mechanism. Target complete, frequent endings (2-4 terms), e.g. "…under $500 reviews", "…for asphalt", "…in children". These carry the intent signal.

Conclusion 2 - log-based popularity is irreplaceable

Despite the LLM era, the core of the ranking is still query popularity. Semantics provides context, but the final weight goes to what people actually typed. The implication: a phrase with no demand in the logs = low exposure, no matter how "semantically perfect" it is. Proof of demand first, then content production.

Conclusion 3 - this is a game about query distribution, not position #1

The patent changes HOW queries are formulated, so the impact shows up in impressions/CTR distribution, not in a single position jump. Measure by suffix clusters, not by individual keywords - otherwise you'll miss the effect entirely.

Data stream flowing through selective collectors, capturing some elements while others pass freely.

5. Who wins, who loses

Winners

E-commerce with dedicated pages for long, transactional phrases (extensive product long-tail).
Aggregators, FAQs, how-to guides and recipes leveraging frequent, repeatable suffixes.
Local players with "service + neighbourhood" variants and "near me" phrases (incomplete tokens).

Losers

Sites built on unique, low-popularity phrases - low query popularity = lower exposure.
Sites with weak reputations / low log presence that don't appear as candidates.
Top-of-funnel content produced at scale without checking demand for specific endings.

Impact scale: INCREMENTAL - a single shift is small, but it compounds over time. Most affected surfaces: Google Search (organic, HIGH), AI Overviews (MEDIUM, via input query variant), YouTube Search (MEDIUM).

Elegant, segmented path illustrating an action plan leading to a clearly defined goal.

6. Action plan

Ordered from the cheapest and fastest move to the most advanced. Each step has a verification criterion - you check it off in the audit. Effect estimates are ranges, not guarantees.

Extract suffixes from logs. From your internal search engine and server logs, collect unique queries, group by suffix (2-4 terms), count popularity. Check: CSV with top 500 suffixes + counts, ≥95% of records have non-empty tokens.
Map suffixes to pages. Pair the top 20% of suffixes (by popularity) with existing URLs and flag gaps. Check: ≥80% of top suffixes have an assigned URL or a "missing" flag.
Tune title / H1 / first paragraph on priority pages so they naturally contain popular endings (without violating E-E-A-T). Check: title and H1 contain the suffix within the first 100 characters for a sample of 50 URLs (≥80%).
Build content hubs around suffix clusters and connect variants with internal linking. Check: hub + links to related URLs for ≥80% of top 50 suffixes.
Create dedicated long-tail landing pages for popular suffixes - but only after confirming demand in the logs. Check: list of new landing pages + documented production process.
Enable anonymised logging of incomplete tokens in the front-end (GDPR-friendly). Check: GA4 event partial_input_capture with prefix distribution.
(Advanced) Recreate suffix_similarity and completion_score on your own logs. Check: top-5 alignment with offline recreation ≥75%.
Set up a completion_score dashboard with an alert on >20% shift in the top-10 suffixes. Check: daily distribution + alert test over a 30-day window.
Measure the effect by suffix groups in GSC (impressions + CTR), not by individual keyword. Check: monthly report with impressions/avg position delta.

Cheapest move with the highest return: steps 1 + 2. Start there - suffix gap mapping gives you the fastest list of opportunities for low-cost long-tail production.

X-ray view of data surface, revealing hidden ranking mechanisms.

7. Ranking signals and how to measure them realistically

Three Tier 1 signals drive suggestions. None are directly visible in Ahrefs/SEMrush - you need log-level data. Proxies below.

completion_score (Tier 1)

overall suffix score: similarity × popularity of source queries; controls suggestion position. Proxy: GSC - impressions/CTR delta for suggested queries vs. previous period + internal logs (unique queries) for recreation.

query_popularity_weight (Tier 1)

frequency of candidate queries in logs; weight in completion_score. Proxy: GSC → Queries (impressions) + Ahrefs Organic Keywords (volume delta) + log-level frequency per term sequence.

suffix_similarity_score (Tier 1)

alignment of a candidate's ending with the partial query's suffix (term order, incomplete token). Proxy: log tokenisation (Screaming Frog custom extraction + server logs), metric recreation; comparison with GA4 query_submit.

incomplete_token_match_rate (Tier 2)

share of candidates matched via the incomplete final token's prefix. Proxy: GA4 custom event partial_input_submit + log-level fraction of suggestions matched by token prefix.

time_decay_of_popularity (Tier 2)

freshness of popularity (trends) when computing completion_score. Proxy: GSC with date segmentation + internal time-series frequency; Ahrefs Trending Keywords.

Set of identical, organized templates ready for immediate duplication and filling.

8. Ready-to-copy artefacts

Replace {URL}, {TITLE}, {AUTHOR} with your own values.

JSON-LD (SearchAction) - paste into `<head>`

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "WebSite",
  "url": "{URL}",
  "name": "{TITLE}",
  "publisher": {
    "@type": "Organization",
    "name": "{AUTHOR}"
  },
  "potentialAction": {
    "@type": "SearchAction",
    "target": "{URL}/search?q={search_term_string}",
    "query-input": "required name=search_term_string"
  }
}
</script>

GA4 / GTM events

suggestion_shown

suggestion list displayed. Parameters: session_id, page_url, partial_query, suggestion_list_ids.

suggestion_clicked

suggestion clicked. Parameters: suggestion_id, clicked_text, position, page_url.

partial_query_captured

anonymous capture of a partial token for suffix analysis. Parameters: anon_user_id, partial_query, timestamp, device_type.

suffix_cluster_update

cluster update in ETL. Parameters: cluster_id, top_suffix_terms, total_query_count.

Audit prompt (paste into Claude/GPT, substitute `{URL}`)

Evaluate the site {URL} against patent US8886662B1
(shared suffix analysis) and prepare an optimisation report:

1. Does the server/FE log unique queries and partial queries
   (incomplete final term)?
2. Top 100 suffixes (2-4 terms) from logs over the past 90 days?
3. How many have dedicated pages/FAQs and what are their
   impressions/CTR?
4. Do the title/H1/first paragraph address the detected suffixes?
5. Does the telemetry include query_popularity and how is it weighted?
6. What thresholds (>=3 vs >=2 terms) are in use and are they optimal?
7. Safeguards against manipulation (synthetic logs, bots)?
8. Suffix coverage in the pipeline + 5 priority changes for 6 weeks.

Checklist (e.g. for Obsidian / Asana)

[ ] Extract top N suffixes (2-4 terms) from logs over the past 90 days
[ ] Flag incomplete tokens in front-end telemetry
[ ] Map suffix → page; assign content owners to gaps
[ ] Update title/H1 for top 20 suffixes
[ ] Launch A/B test of suggestions (suffix-weighted vs. control)
[ ] Implement GA4 events (suggestion_shown / _clicked / partial_query_captured)
[ ] Weekly monitoring of suffix velocity anomalies
[ ] Publish 10 FAQ pages for fallback 2-term suffixes
[ ] Revise similarity thresholds (>=3 vs >=2) based on A/B results

Abstract, minimalist architectural structure with missing key geometric blocks, creating clear gaps.

9. Audit questions - run these on your client's account

If you answer "no" to any of these, you have a gap to close before any content production.

Do the logs record both full and partial (incomplete token) queries? Without this you can't recreate candidate sets.
Do you have a query_popularity metric for unique queries and how often do you aggregate it? Without reliable popularity data = incorrect rankings.
Does a suffix map (2-4 terms) → page exist? Without it you won't leverage the autocomplete potential in organic traffic.
Do the title/H1/first paragraphs of priority pages naturally contain suffixes from the logs?
Do you have SearchAction / schema.org implemented for the internal search engine?