LLM Hallucinations

Learn how to detect and prevent LLM Hallucinations with LLM Auditing

What are Large Language Models (LLMs)?

LLMs are models that have been trained on large amounts of text data for Natural Language Processing (NLP) tasks, to understand, summarize, generate and predict new text-based content.

Risks of Large Language Models

In recent months, awareness onthe risks associated with Large Language Models (LLMs) has seen significant growth. As we navigate this dynamic landscape, it becomes imperative to systematically categorize and analyze the emerging species of risks and harms generated by LLMs.

Drawing from academic research, we find it valuable to adopt established taxonomies to classify these risks effectively. One such taxonomy identifies five key verticals of algorithmic risks: Robustness, Bias, Privacy, Explainability, and Efficacy. Within the context of LLMs, these risks manifest in distinct ways:‍

Robustness

This category encompasses the risks associated with the susceptibility of LLMs to adversarial attacks, where the models may fail to perform adequately under certain conditions.

Bias

LLMs are prone to generating biased outputs due to various factors such as the quality of training data, application context, and inferential capabilities. These biases can significantly impact the fairness and inclusivity of the outcomes produced by LLMs.

Explainability

There is a risk of LLMs generating decisions that are opaque and difficult to comprehend for developers, deployers, and end-users, undermining trust and accountability in the decision-making process.

Privacy

LLMs may inadvertently leak sensitive information or personal data, posing risks to user privacy and confidentiality.

Efficacy

This category highlights the risk of LLMs underperforming relative to their intended use-case, potentially leading to suboptimal outcomes and diminished utility.

The following section provides an in-depth of the risks associated with LLMs.

Practical Approaches towards LLM Safety and Ethics

Efficacy
  • World Knowledge
  • Commonsense Reasoning
  • Language Understanding
  • Reading Comprehension
Robustness
  • Hallucination
  • Jail Breaker
  • Prompt Injection
Explainability
  • Plausibility & Faithfulness in CoT
  • Document Level Transparency
Privacy/Security
  • Personal Data Leakage
  • Unauthorized Task Execution
Fairness/Bias
  • "Fair" treatment
  • Stereotype
  • Toxicity

What are Hallucinations in Large Language Models?

Hallucinations in Large Language Models (LLMs) refers to a phenomenon where the model generates text that is incorrect, nonsensical, or not real. This issue undermines the reliability of these models, casting doubt on the trustworthiness of their output. The prevalence of hallucinations poses a significant challenge in the advancement of LLMs, often stemming from deficiencies in the quality of training data and the interpretative capabilities of the models.

Types of Hallucinations in LLMs

1

Hallucination based on dialogue history

Dialogue history-based hallucinations occur when an LLM mixes up names or relations of entities. For example, if the user mentions that their friend John likes hiking, and later says their uncle Mark is coming to visit, the AI might incorrectly link John and Mark together as the same person due to faulty recall. Furthermore, during a conversation, an LLM can create new incorrect inferences based on previous errors within the dialogue history, further distorting the context and content of a conversation in a snowball effect.

It is important to remember what causes hallucinations in LLMs. These mistakes often occur in dialogue because LLMs rely on pattern recognition and statistics. Without a grounding in common sense or factual knowledge, LLMS can get lost and generate hallucinations.

An abstractive summarisation system is a model commonly used in LLMs to generate summaries of textual information, often for the purposes of making a piece of text more coherent and comprehensible.

Despite their usefulness in condensing information, abstractive summarisation systems can be prone to errors or semantic transformations between the original and generated data, triggering a hallucination in an LLM’s output.

Again, this is because they lack true comprehension of the source text, instead relying on pattern recognition and statistics. They may, as a result, distort or even entirely fabricate details, inferring unsupported causal relationships or retrieving unrelated background knowledge.

2

Hallucination in abstractive summarisation

3

Hallucination in generative question answering

This type of hallucination occurs when an LLM makes an erroneous inference from its source information and arrives at an incorrect answer to a user question. This can happen even when relevant source material is provided.

For example, if a user asks, "Which private research university is located in Chestnut Hill, Massachusetts - Boston College or Stanford University?" and context is provided stating Boston College is located in Chestnut Hill, an LLM may still incorrectly respond "Stanford University" due to its own prior knowledge about Stanford being a top private research university. Rather than accurately recalling the pre-existing source information, the model ignores the evidence and makes an unjustified inference based on its existing knowledge.

In the context of large language models, a general data generation hallucination refers to a situation where the model generates outputs that may appear plausible or coherent but are not supported by factual or reliable information. It is a type of error where the model fabricates details or makes assumptions that go beyond the input data it has been trained on. This can result in the generation of false or misleading information that may seem convincing to humans but lacks a proper factual basis. Unlike other types of hallucination, the root cause of a general data hallucination is an overextension beyond training data rather than an incorrect inference or lack of grounding. The mode essentially imagines new information that isn't warranted by its training data.

4

General data generation hallucination

Auditing Large Language Models with Holistic AI

Concerted regulatory momentum to govern generative models, and specifically Large Language Models (LLMs) is accelerating across the world – and companies desirous of developing and deploying such models must proactively ensure they fulfil the increasing list of obligations.

Holistic AI takes a comprehensive, interdisciplinary approach to responsible AI. We combine technical expertise with ethical analyses to assess systems from multiple angles. Safeguard, Holistic AI's LLM Auditing product, acts as a robust solution to identify and address these issues through a multifaceted approach:

  • Blocking Serious Risks and Safety Issues: Can be used to prevent the inadvertent leakage of personal information, enabling organisations to leverage their full data set while ensuring data privacy and protecting their brand reputation.
  • Detecting Hallucinations and Stereotypes: It detects and rectifies incorrect responses, hallucinations, and the perpetuation of stereotypes in generated text.
  • Providing Readability Scores: It assesses and offers readability scores for generated text, ensuring that the output is both comprehensible and suitable for its intended audience.
  • Preventing Offensive Language and Toxicity: It proactively averts the use of offensive language, counters malicious prompts, and minimises toxicity in generated text.

Schedule a demo with us to get more information

By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.