AI hallucination: Unless you live an unplugged life, you’ve probably heard this term in the same breath as generative AI topics like ChatGPT. If you’ve wondered what “hallucinations” are, how they happen – and, most importantly, how to mitigate them – keep reading. 

An AI hallucination occurs when a large language model (LLM) used to generate output from an AI chatbot, search tool, or conversational AI application produces something inaccurate, false, or misleading. 

Things go awry with generative AI for various reasons. These include errors in the AI’s understanding or generation processes, overconfidence in its learned knowledge, training on data that doesn’t accurately represent the kind of tasks the AI is performing, training on old or inaccurate content, and a lack of human feedback or oversight into system outputs.

Understanding what generative AI hallucinations are and why they happen is important for companies who want to use artificial intelligence effectively and with confidence. A generative AI tool is only as good as its training data, AI algorithms, generative model and search engine retrieving the right, relevant content allow it to be.

Illustration of a guy
Coveo Logo White

© 2024 Coveo Solutions Inc. All Rights Reserved.

Customer AgreementPrivacy PolicyCookie Preferences

Hallucinations are a known GenAI headache that occurs with AI tools like OpenAI’s ChatGPT and Google’s Bard. For example, ChatGPT famously made up fake legal quotes and citations which were used in an actual court case. The unfortunate lawyers who submitted the false information were fined $5000.

And Amazon’s Q, big tech’s most recent introduction into the AI chatbot space, debuted with a thud when, three days after its release, Amazon employees warned it was prone to “severe hallucinations” and privacy concerns.

An AI chatbot like ChatGPT or Jasper AI operates thanks to the integration of various technologies including LLMs for text generation and comprehension and natural language processing for interpreting and processing human language. Underpinning these functionalities are neural networks, which provide the computational framework for handling and analyzing the data. 

There are inherent limitations in the way generative AI chatbots process and interpret information that cause hallucinations. These may manifest as:

  • False assumptions – LLMs make assumptions based on patterns they’ve seen during training and these assumptions make them prone to hallucinations. This happens because they learn to associate words in specific contexts (e.g., “Paris” is always associated with “France”). The LLM might incorrectly assume this pattern applies in all contexts, leading to errors when the context deviates from the norm.

  • Misinterpreting user intention – LLMs analyze user inputs based on word patterns and context, but lack a true understanding of human intentions or nuances. This can lead to situations where the model misinterprets the query, especially in complex, ambiguous, or nuanced scenarios, resulting in responses that don’t align with the user’s actual intent.

  • Biases caused by training data – This is a big one. LLMs learn from huge datasets that often contain cultural, linguistic, and political biases. The responses it generates are naturally going to reflect these biases, generating unfair, false, and skewed information.

  • Overconfidence – LLMs often seem incredibly confident in their responses. They’re designed to respond without doubt or emotion. LLMs generate text based on statistical likelihoods without the ability to cross-reference or verify information. This leads to assertive statements like Bard’s confident assertion that the James Webb Telescope took the “very first images of exoplanets” (it didn’t). These models don’t have a mechanism to express uncertainty or seek clarification.

  • An inability to reason – Unlike humans, LLMs don’t have true reasoning skills. They use pattern recognition and statistical correlations to generate a response. Without logical deduction or an understanding of causal relationships, they’re susceptible to generating nonsense, especially in situations that require a deep understanding of concepts or logical reasoning.

In an interview with the New York Times, ChatGPT’s former CEO Sam Altman affirmed that generative pre-trained transformers (GPTs) like ChatGPT are bad at reasoning and this is why they make stuff up. Per Altman:

"I would say the main thing they’re bad at is reasoning. And a lot of the valuable human things require some degree of complex reasoning. They’re good at a lot of other things — like, GPT-4 is vastly superhuman in terms of its world knowledge. It knows more than any human has ever known. On the other hand, again, sometimes it totally makes stuff up in a way that a human would not.”

This creates an expansive and searchable knowledge base that ensures the AI language model outputs are comprehensive. Unifying content requires segmenting documents before routing them to the GenAI model, as we note in our eBook, GenAI Headaches: The Cure for CIOs:

This involves maximizing the segmentation of documents as part of the grounding context process before they are routed to the GenAI model. The LLM then provides a relevant response from these divided chunks of information, based specifically on your organization’s knowledge. This contextualization plays an essential role in the security  of generative responses. It ensures that the results of the AI system are relevant, accurate, consistent and safe.

6. Use the most up-to-date content 

Regularly refresh, rescan, and rebuild content in your content repository to ensure that the searchable content is current and relevant. These three update operations ensure that GenAI outputs remain accurate. 

7. Provide a consistent user experience

If you’re using GenAI for search, chat, and other use cases, you need to ensure that users get consistent answers. A unified search engine is the answer to this challenge. A composable AI search and generative experience platform like Coveo uses AI and machine learning to identify, surface, and generate results and answers from a unified index of all your content, ensuring that information is relevant, current, and consistent across your digital channels.   

Introducing generative AI applications – and their potential for hallucinations – into your enterprise comes with some challenges. Public generative AI models in applications like ChatGPT pose huge privacy and security risks. For example, OpenAI may retain chat histories and other data to refine their models, potentially exposing your sensitive information to the public. They may also collect and retain user data like IP addresses, browser information, and browsing activities. 

But, good news. Using GenAI in customer and employee experiences safely – and ensuring AI-generated content it produces is accurate, secure, and reliable – is possible. Here are some steps you can take to mitigate AI hallucinations and make AI output enterprise ready:

1. Create a secure environment

Establishing a digital environment that adheres to specific security standards and regulations helps prevent unauthorized access and use of enterprise data. You should operate your GenAI solution in a locked-down environment that complies with regulations like AICPA SOC 2 Type II, HIPAA, Cloud Security Alliance, and ISO 27001. 

2. Unify access to your content 

Most companies store information across multiple databases, tools, and systems – all of this contributes to your overall knowledge base. You can use content connectors to create a unified index of your important data from all possible content sources (Slack, Salesforce, your Intranet, etc.). 

AI Hallucinations: Does Your GenAI Need a Reality Check?

Why AI Hallucinations Happen

How to Mitigate AI Hallucinations

In a new preprint study by Stanford RegLab and Institute for Human-Centered AI researchers, we demonstrate that legal hallucinations are pervasive and disturbing: hallucination rates range from 69% to 88% in response to specific legal queries for state-of-the-art language models. Moreover, these models often lack self-awareness about their errors and tend to reinforce incorrect legal assumptions and beliefs. These findings raise significant concerns about the reliability of LLMs in legal contexts, underscoring the importance of careful, supervised integration of these AI technologies into legal practice.” (Source)

Leveraging Enterprise-Ready Generative AI

Understanding and mitigating the phenomenon of AI hallucinations in generative AI is the only way to make a GenAI system enterprise ready. Hallucinations pose significant risks to your reputation, your data security, and your peace of mind. 

Coveo has over a decade of experience in enterprise-ready AI. The Coveo Platform™ provides a secure way for enterprises to harness GenAI while mitigating the risks of this technology. Designed with security at its core, Coveo Relevance Generative Answering keeps generated content accurate, relevant, consistent, and safe. Features like auditable prompts and responses and secure content retrieval based on user access rights, safeguard sensitive information. Coveo is customizable and easy to align with specific business goals. This balance between innovation and security is essential for enterprises.

Want to Start Leveraging Accurate, Secure GenA for CX and EX?

Coveo went from experimentation, to deployment, to results with its customers.

With a 90-minute deployment, you could see significant results in only 6 weeks! The 20% increase in self-service resolution and 40% decrease in average search time Xero achieved is only the tip of the iceberg.

Join our virtual 1-hour event to learn more ↓

Bard, Google’s introduction to the generative chatbot space, made up a “fact” about the James Webb Space Telescope during its livestream debut – an event attended by the media. Alphabet, Google’s parent company, lost $100 billion in market value after Reuter’s pointed out this unfortunate hallucination.

3. Retain data ownership and control

Complete ownership and control over your data, including index and usage analytics, keeps proprietary information safe. Ensure communication is secure by using encryption protocols like HTTPS and TLS endpoints that provide authentication, confidentiality, and security. Implement an audit trail to understand how the AI model is being used so you can create guidelines that promote responsible, ethical use of the system.

4. Verify accuracy of training data

Ensure the accuracy of training data and incorporate human intervention to validate the GenAI model’s outputs. Providing citations and links to source data introduces transparency and allows users to validate information for themselves.

5. Implement retrieval augmented generation (RAG)

Retrieval augmented generation is an AI framework that removes some of the limitations that tie an LLM to its training data. With RAG, an LLM can access updated knowledge and data based on a user query – a process called “grounding.” RAG gives companies greater control over what information can be used to answer a user query, pulling only the most relevant chunks from documents available in trusted internal sources and repositories.

Content Writer and Journalist 

Jacqueline Dooley

ChatGPT summarizing a non-existent New York Times article based on a fake URL. (Source)

Rethink AI Search.
Real GenAI Results Fast.