In August 2025, a court in Washington issued a 230-page ruling in an antitrust case against Google. For some, this is a continuation and confirmation of what we learned from the 2023 trials; for others, it may be the first encounter with this type of news. Below, I’ve gathered the most important information.
What we’ve been speculating about and testing for years, we now have in black and white, in court documents. Let’s dive into the technical details of what really happens “under the hood” of Google.
🔍 Search Index – the foundation of everything
Let’s start with the basics. The court defined the search index precisely:
“Search Index means any databases that store and organize information about websites and their content that is crawled from the web”
Sounds simple? In reality, it’s a monumental undertaking. Google crawls “trillions of web pages” (yes, trillions – not a typo). But not all of them make it into the index.
How does Google decide what to index?
Here comes the first revelation. Court documents reveal that user query data directly influences what Google crawls and how often:
“Query data helps a GSE understand what web pages to crawl and how frequently. And building and maintaining a comprehensive and fresh search index is essential to answering user queries, especially those of the long-tail variety.”
What does this mean in practice?
If users frequently search for information on a specific topic, Google will crawl pages from that industry more often. It’s a feedback loop – popular queries = more frequent crawling = fresher data in the index.
Signals determining crawling
From the documents, we learn that Google uses specific signals to optimize crawling:
Do you need an SEO Audit?
Let us help you boost your visibility and growth with a professional SEO audit.
Get in TouchQuality signals
- help determine crawling frequency
- high-quality pages = more frequent crawler visits
Popularity signals
- pages frequently clicked by users
- higher update frequency
Spam score
- pages with high spam score are crawled less frequently or not at all
- Google actively filters “spam-heavy or pornographic pages”
“Google assigns a score to the pages it crawls, and it endeavors to exclude from its web search index pages without value to users, such as spam-heavy or pornographic pages.”
Freshness – recency matters
One of the most important fragments of the document:
“‘Freshness,’ or the recency, of information is an important factor in search quality. GSEs ‘need to know how to recrawl sites to make sure that they do at all times have a reasonably fresh copy of the web that you are looking at.'”
Google doesn’t crawl all pages with the same frequency. News sites – daily, and evergreen content – less often. The system is dynamic and learns based on how often a page actually changes its content.
🎯 Glue and Navboost – system names you need to know
Now we move to the crucial part. Court documents describe the details of two Google systems.
Glue – “Super Query Log”
The court describes Glue as:
“Glue is essentially a ‘super query log’ that collects a raft of data about a query and the user’s interaction with the response.”
What exactly does Glue collect?
Documents list four data categories:
1. Query data:
- query text
- language
- user location
- device type (desktop/mobile)
2. Ranking information:
- top 10 results (“blue links”)
- all triggered SERP features (images, maps, Knowledge Panel, “People also ask”, etc.)
3. SERP interaction data:
- clicks
- hovering
- time spent on SERP
4. Query interpretation:
- spelling corrections
- synonyms
- suggestions (“did you mean”)
- relevant terms from the query
Navboost – click memory system
“Navboost is a ‘memorization system’ that aggregates click-and-query data about the web results delivered to the SERP.”
Navboost can be described as a giant table that remembers:
- which results users clicked for specific queries
- how long they stayed on the page
- whether they returned to results (pogo-sticking)
- whether the click was satisfying
Key information from the documents:
“Google trains Navboost on 13 months of user data, which is the equivalent of over 17 years of data received by Bing.”
Navboost uses 13 months of click data. This means your SEO actions from the past year affect today’s ranking.
How do Glue and Navboost work together? Practical example
SCENARIO: User searches for “best italian restaurant near me”
STEP 1 – QUERY User → “best italian restaurant near me” Google collects: location, device, time
STEP 2 – GLUE RECORDS: ✓ Query text: “best italian restaurant near me” ✓ Location: GPS/IP (e.g., Warsaw, Mokotów) ✓ Device type: mobile (iPhone) ✓ Time: 18:00, Friday ✓ Interface language: Polish
STEP 3 – NAVBOOST CHECKS: “For similar queries in this location at this time, which restaurants did people click and not return to results?”
Historical data shows:
- Ristorante Roma: 45% CTR, avg. 3 min on page
- Trattoria Verde: 32% CTR, avg. 1.5 min
- Pasta & Basta: 28% CTR, avg. 4 min
STEP 4 – RANKEMBED UNDERSTANDS: Semantic analysis: “best” = user looking for high quality “near me” = proximity is key “italian” = Italian cuisine (not pizza place, not bistro) Context: Friday 18:00 = probably dinner reservation
STEP 5 – RESULTS Ranking considers:
- Popularity (from NavBoost): Ristorante Roma frequently clicked
- Semantic match (from RankEmbed): page contains appropriate keywords
- Location: distance < 2km
- Time: restaurants open now + accepting reservations
- Spam score: only verified business
STEP 6 – USER CLICKS: Clicked: “Ristorante Roma” (position #1) Spent: 4 minutes on page Action: Clicked phone number (intent fulfilled) Didn’t return to results = was satisfied
STEP 7 – SYSTEM LEARNING: Glue + Navboost record: Query pattern: “best [cuisine] restaurant near me” Location: Mokotów, Warsaw Time: Friday, evening Winner: Ristorante Roma Signal: long time on page + phone click = success
NEXT TIME: When someone in a similar location, at a similar time, searches a similar query: → Ristorante Roma will rank higher → Signal reinforced for Italian restaurant-related queries → System “knows” this is a good result for this type of intent
🤖 RankEmbed – AI in service of understanding queries
The third key system is RankEmbed (later developed into RankEmbedBERT). It’s a deep learning model:
“RankEmbed and its later iteration RankEmbedBERT are ranking models that rely on two main sources of data: 70 days of search logs plus scores generated by human raters and used by Google to measure the quality of organic search results.”
What does RankEmbed do?
1. Semantic Matching
RankEmbed understands the meaning of the query, not just keywords:
“Embedding based retrieval is effective at semantic matching of docs and queries”
If you search for “how to fix a faucet,” RankEmbed knows that:
- “replacing faucet seal” is a relevant result
- “plumber Warsaw” might be relevant
- “history of faucets in Poland” is not relevant
2. Long-tail Queries
Documents emphasize RankEmbed’s particular strength:
“RankEmbed particularly helped Google improve its answers to long-tail queries.”
Long-tail queries are searches that appear rarely (less than 100 times in 12 months). Examples:
- “how to reset bluetooth in honda civic 2019”
- “can you add pumpkin to gingerbread”
- “wordpress custom post type archive pagination broken”
These queries represent over 38% of all searches (data from the document, Bing study).
3. Efficiency
“RankEmbed is trained on 1/100th of the data used to train earlier ranking models yet provides higher quality search results.”
The system uses 100 times less data than previous models and delivers better results. This demonstrates the power of deep learning.
📊 Ranking signals – what actually counts?
Court documents confirm the existence of hundreds of ranking signals. We didn’t get a complete list, but they confirm key categories:
Top-Level Signals
From Google expert testimony, there are “top-level signals” that are aggregates of many sub-signals:
1. Quality Score
- mainly based on PageRank
- site authority
- number and quality of links
2. Popularity score
- data from Chrome (user visits)
- number of “anchors” (links to the page)
- visit frequency
3. Spam Score
- identification of low-quality content
- manipulation detection
- filtering pornography and spam
4. Signals from NavBoost
- historical click data
- user satisfaction
- time spent on page
5. RankEmbed signals
- semantic matching
- intent matching
User-Side Data – user behavior data
The court defined “User-side Data” as:
“all data that can be obtained from users in the United States, directly through a search engine’s interaction with the user’s Device, including software running on that Device, by automated means.”
What does this specifically mean? Every interaction with results:
- which link you clicked
- how much time you spent on the page
- whether you returned to results (pogo-stick)
- how quickly you scrolled the page
- whether you clicked other elements (phone, form)
Long-Tail – where every signal counts
Documents particularly emphasize the significance of data for long-tail queries:
“Google’s scale advantage affords it greater insight into what information users are looking for and which results they find relevant and authoritative.”
If your site serves long-tail well (specific, rare queries), you have an advantage. Why?
Less competition
- fewer pages target these phrases
- easier to stand out
Higher CTR has more significance
- Google has less historical data
- every click and time on page counts more
Better matching = better ranking
- RankEmbed particularly helps with long-tail
- semantic matching is crucial
💡 Practical conclusions for SEO
1. CTR and time on page matter
Navboost uses 13 months of click data. What you can do:
✓ Optimize Title and Description for CTR
- use numbers (5 ways, 10 examples)
- add year (2025 Guide)
- communicate unique value
✓ Reduce Pogo-Sticking
- fulfill search intent immediately
- use clear headers showing what user will find
✓ Increase engagement
- add interactive elements (calculators, quizzes, mini games)
- break up long texts into sections
- use examples and case studies
2. Content freshness
Documents confirm:
“Freshness, or the recency, of information is an important factor in search quality.”
But not all pages need frequent updates:
Update often (frequent crawling):
- news and current affairs
- pricing and offers
- trends and statistics
- events
Update less often, but still update (stable content):
Fun Fact: Lot of media sites can have live coverage from 20 years ago 🙂
- evergreen guides
- definitions
- scientific explanations
Pro tip: Add “last updated” in the title for pages you regularly update. This signals to users that content is fresh.
3. Long-Tail is still gold
From the documents:
“More than 38.7 percent of searches are for rare queries that are searched less than 100 times”
Long-tail strategy:
✓ Create detailed, specific content Instead of: “Marketing automation” Better: “How to configure welcome email workflow in GetResponse 2025”
✓ Answer specific questions
- analyze “People also ask” in SERP
- check related searches at bottom of SERP
✓ Build expertise in niche
- several very detailed articles > one general
- Google will appreciate if you become an authority in a narrow topic
4. User Intent – understanding is key
RankEmbed uses semantic matching, so:
✓ Analyze SERP before writing
- what does Google already show for your keyword?
- what format dominates? (list, guide, definition)
- what sub-topics appear?
✓ Match format to intent Informational → Guide, explanation Transactional → Comparison, review, “best X” Commercial → Case study, “how to choose” Navigational → Specific page/brand
✓ Use related terms and synonyms
- RankEmbed understands context
- don’t stuff keywords – write naturally
- use LSI keywords (related terms)
5. Spam Score – don’t risk it
Google actively filters spam. Documents mention:
“Google endeavors to exclude from its web search index pages without value to users, such as spam-heavy or pornographic pages.”
Red flags (avoid):
- keyword stuffing
- hidden content
- doorway pages
- thin content with ads
- auto-generated content without value
- cloaking
Green flags (apply):
- research and data
- build author authority
- clear value proposition
🎓 What the document confirms
Court documents gave us confirmation of what many SEOs have suspected for a long time – that’s our job, relying on Google documentation while filtering out the bullshit they feed us from corporate employees.
✅ Truths:
User signals matter
- CTR, time on page, pogo-sticking – these aren’t “fake metrics”
- Navboost uses 13 months of click data
- every interaction is recorded and analyzed
AI and machine learning are at the center of the algorithm
- RankEmbed uses deep learning to understand queries
- semantic matching > exact keyword matching
- system learns from user data
Index freshness depends on page popularity
- quality and popularity signals → more frequent crawling
- high-quality pages = more frequent Googlebot visits
- it’s not democratic – leaders get more attention
Long-tail queries are a huge opportunity
- 38%+ of all searches are rare queries
- RankEmbed particularly helps with long-tail
- less competition = easier to win
Google has a gigantic advantage thanks to data
- trillions of pages in index
- 13 months of click data in Navboost
- 70 days of logs in RankEmbed
- Chrome data about actual visits
🚫 Debunked Myths:
MYTH: “Google doesn’t use CTR for ranking” → Fact: Navboost is literally a system based on click-and-query data
MYTH: “Time on page doesn’t matter” → Fact: Glue records “duration on SERP” and user interactions
MYTH: “PageRank no longer matters” → Fact: PageRank is still part of Quality score
MYTH: “AI at Google is just an add-on” → Fact: RankEmbed and deep learning are at the center of the ranking system
🎯 Actions to take today
Based on revealed information, here are specific actions you can take right now:
Short-term tactics:
1. CTR audit in Google Search Console → Find pages with impressions > 1000 and CTR < 3% → Rewrite title and description → Add numbers, year, “power words” → A/B test if you can
2. Identify and Optimize Long-Tail → GSC → Performance → Filter queries by “impressions” → Find long-tail with position 4-10 → Add dedicated sections or new articles → Use exact phrases in H2/H3
3. Reduce Pogo-Sticking → Add table of contents at beginning of long articles → Use specific headers describing section content → Add “TL;DR” or executive summary → Place most important info above the fold
Medium-term strategies:
1. Content refresh → Identify top 20% of pages delivering traffic to site → Create update calendar (every 3-6 months) → Add update date in visible place → Update statistics, examples, photos
2. Semantic SEO review → For each keyword, do research: – What related terms does top 10 use? – What questions are in “People Also Ask”? – What sub-topics does competition cover? → Add these elements to your articles → Don’t stuff – write naturally, but comprehensively
3. User engagement optimization → Add interactive elements: – Calculators, charts – Quizzes – Checklists (to print/download) – Tables and comparisons → Measure engagement: scroll, time on page, pv/sessions
Long-term projects:
Technical SEO for better crawling, indexing and EEAT support, including:
- branding SEO – brand presence analysis in Google
- technical author optimization
- structured data
- sitemaps, rss, robots.txt, loading time
- structure and content optimization
- multimedia optimization
- etc.
📚 Summary: new era of transparency
We know that:
- Navboost uses 13 months of click data for ranking
- Glue collects every aspect of user interaction with SERP
- RankEmbed uses AI for semantic matching and is particularly effective for long-tail
- quality, popularity and spam scores determine crawling frequency
- user signals (CTR, time on page) are not a myth – they’re at the center of the algorithm
These are not speculations. These are not tests. These are testimonies under oath from Google vice presidents, confirmed by federal court.
🔗 Sources
United States v. Google LLC, Case No. 20-cv-3010 (APM), Memorandum Opinion (D.D.C. Sept. 2, 2025)
Not getting enough traffic from Google?
An SEO Audit will uncover hidden issues, fix mistakes, and show you how to win more visibility.
Request Your Audit