The burgeoning landscape of generative artificial intelligence, dominated by platforms like ChatGPT, Gemini, and Grok, currently lacks definitive guidelines for how content creators can ensure their material appears within AI-generated answers. This absence of official directives leaves businesses and individuals in a state of uncertainty, prompting a reliance on independent research to decipher the evolving mechanics of AI citation. While Microsoft’s recent "AEO and GEO" guide offered common-sense advice, it did not provide actionable strategies for AI visibility. Fortunately, two recent academic studies are beginning to shed light on this complex issue, offering valuable insights into optimizing content for inclusion in AI-powered search results and summaries.
The emergence of AI-driven search functionalities, such as Google’s AI Overviews (formerly AI Mode) and other conversational AI interfaces, signifies a paradigm shift in how users access information. Instead of merely presenting a list of links, these platforms aim to synthesize information and provide direct answers, often accompanied by citations to the source material. Understanding how these citations are generated is becoming increasingly crucial for maintaining online visibility and establishing authority in an AI-centric digital ecosystem. The studies under review, conducted by independent researchers, delve into the quantitative aspects of AI citation, providing empirical data that can inform future content strategies.
One study, attributed to a researcher named "Kevin," analyzed the citation patterns of Grok and ChatGPT. Kevin’s findings revealed a significant disparity in the volume of citations provided by these platforms. Grok, for instance, delivered an average of 33 citations per query, a stark contrast to ChatGPT’s average of just 1.5 citations. This quantitative difference suggests varying approaches to source attribution among different AI models, potentially reflecting their underlying architectures and training methodologies. Furthermore, Kevin observed that approximately 70% of citations within Google’s AI Overviews and Gemini responses incorporated a specific URL fragment, #:~:text=, which directly links to the exact sentence within the source document that was used to formulate the answer. This "text fragment" functionality is a critical development, as it allows users to pinpoint the precise information used by the AI, thereby enhancing transparency and facilitating source verification.
A second study, conducted by "Daniel," further investigated the citation behaviors of Google’s AI Overviews and Gemini. While both platforms exhibit similarities in their citation selection, Daniel’s research highlighted key distinctions and commonalities that offer actionable intelligence for content creators. The collective findings from these two studies provide a foundational understanding of how AI platforms currently select and present source material, albeit with the caveat that this field is rapidly evolving and subject to ongoing algorithmic changes.
Optimizing Citations: Strategic Placement and Content Structure
A recurring theme emerging from both studies is the importance of content placement and structure. The research strongly suggests that AI platforms exhibit a discernible preference for citing information that appears early in a web page.
The Closer to the Top, the Better
Both Kevin’s and Daniel’s studies converged on the finding that generative AI platforms tend to prioritize sources located in the upper portion of a web page. Kevin’s research indicated that a substantial 44.3% of ChatGPT’s citations originated from the initial 30% of a given page’s text. This suggests that AI models are programmed to scan and extract information from the most prominent sections of content first.
Daniel’s study corroborated this observation with even more pronounced results for Google’s AI Overviews and Gemini. His findings revealed that an impressive 74.8% of citations within these platforms appeared in the first half of a page, with a significant 46.1% being concentrated in the initial 30%. While other AI platforms examined in Daniel’s study did not directly link to specific sentences and were thus less prominent in his analysis, the trend for AI Overviews and Gemini is undeniable.
The clear takeaway from these empirical findings is that content creators must ensure that the most crucial information, answers to key questions, or solutions to prominent problems are presented prominently within the first third of their web pages. This strategic placement increases the likelihood of that content being recognized and cited by AI models. For businesses, this means re-evaluating their website structure and content hierarchy to prioritize core messages and essential information at the outset of each page. This could involve front-loading valuable insights, concise summaries, or direct answers to anticipated user queries.
Emphasize Brevity and Clarity: The Power of "Atomic Facts"
Beyond placement, the studies also shed light on the preferred format of content that AI platforms are more likely to cite. Daniel’s research introduced the concept of "atomic facts," which he meticulously defines as "a self-contained, single-claim sentence that makes sense on its own." This concept is pivotal in understanding how AI models process and extract information.
Daniel’s analysis of AI Overviews and Gemini revealed specific characteristics of cited sentences:

- Average length: Cited sentences in AI Overviews and Gemini averaged 16 words. This suggests a preference for concise, digestible pieces of information.
- Sentence structure: A remarkable 85% of cited sentences were found to be single clauses. This indicates that AI models are more likely to extract and cite straightforward statements rather than complex, multi-part sentences.
- Clarity and independence: The emphasis on "atomic facts" implies that AI systems are designed to identify and leverage statements that are clear, unambiguous, and can stand alone without requiring extensive contextualization.
In essence, these findings advocate for a writing style that is direct, to the point, and free from unnecessary jargon or convoluted phrasing. Long, rambling introductions, tangential discussions, or irrelevant dialogue are likely to be disregarded by AI algorithms in favor of clear, factual statements. The implication for content creators is to adopt a writing approach that prioritizes clarity and conciseness, ensuring that each sentence delivers a distinct piece of information.
To aid in this optimization effort, a new free tool has been developed that allows users to track the number of "atomic facts" present on a given web page. This tool can serve as a valuable resource for content creators looking to assess and improve the AI-friendliness of their existing and new content. By analyzing the density of self-contained, single-claim sentences, creators can refine their writing to better align with the apparent preferences of generative AI models.
Divergent Paths: "No Google Overlap" in Citations
While AI Overviews and Gemini appear to share some commonalities in their approach to citation selection, Daniel’s study revealed a notable lack of overlap in the specific domains they cite. He found that only a marginal 4.5% of domains cited by AI Overviews were also cited by Gemini, and conversely, only 13.2% of Gemini’s cited domains appeared in AI Overviews.
This finding is significant. It suggests that while the underlying principles of source selection might be similar – favoring prominent, well-structured content – the actual execution and the specific algorithms responsible for choosing which sources to attribute can lead to distinct outcomes. This "no Google overlap" phenomenon implies that having content appear in one AI-generated answer does not guarantee its inclusion in another, even if both are Google-affiliated. This further underscores the need for a multifaceted approach to content optimization, rather than relying on a single strategy to capture all AI citation opportunities.
Beyond Citations: The Broader Context of AI Visibility
It is crucial to acknowledge that the aforementioned studies focus exclusively on citations, which are explicit references to source material within AI-generated answers. They do not encompass the broader concept of general visibility, which includes unlinked references or instances where a brand or piece of information is implicitly utilized by the AI without a direct citation.
Optimizing for this broader visibility is a more complex endeavor and likely involves ensuring that a brand’s name, key terms, and relevant information are well-represented and consistently present within the vast datasets that train these AI models. This aspect of AI visibility is closely related to traditional search engine optimization (SEO) principles, but with an added layer of complexity due to the opaque nature of AI training data.
The concept of "well-positioned" in training data can be interpreted as having a strong and authoritative presence across the internet, with content that is frequently referenced, highly ranked in traditional search results, and recognized as a reliable source of information. This reinforces the enduring importance of creating high-quality, authoritative, and consistently updated content, regardless of the evolving AI landscape.
Implications and Future Outlook
The research into AI citation optimization arrives at a critical juncture. As generative AI continues to integrate more deeply into search experiences and information consumption, understanding how to be "seen" and "cited" by these platforms is paramount for businesses, publishers, and content creators. The findings suggest a shift towards valuing content that is not only informative but also strategically structured for algorithmic consumption.
The lack of official guidelines from AI providers creates a challenging environment. However, the emerging research provides a valuable roadmap. The emphasis on placing key information at the beginning of content, crafting concise and self-contained sentences, and ensuring clarity and accuracy are actionable strategies that can be implemented immediately.
The implications of this research extend beyond simple SEO adjustments. It prompts a re-evaluation of content creation workflows, editorial guidelines, and even website design principles. In the future, we may see the development of specialized AI optimization tools and services that help content creators navigate this new frontier.
Microsoft’s initial foray with its "AEO and GEO" guide, while basic, signals a growing awareness within major tech companies of the need for such guidance. As AI technologies mature and their impact on information access becomes more pronounced, it is likely that more comprehensive and nuanced best practices will emerge. Until then, embracing the insights from independent research, such as the studies by Kevin and Daniel, will be the most effective way to adapt and thrive in the evolving digital information ecosystem. The ability to be accurately and appropriately cited by generative AI could become a significant differentiator in online visibility and credibility.
