LLMs are models that have been trained on large amounts of text data for Natural Language Processing (NLP) tasks, to understand, summarize, generate and predict new text-based content.
In recent months, awareness onthe risks associated with Large Language Models (LLMs) has seen significant growth. As we navigate this dynamic landscape, it becomes imperative to systematically categorize and analyze the emerging species of risks and harms generated by LLMs.
Drawing from academic research, we find it valuable to adopt established taxonomies to classify these risks effectively. One such taxonomy identifies five key verticals of algorithmic risks: Robustness, Bias, Privacy, Explainability, and Efficacy. Within the context of LLMs, these risks manifest in distinct ways:
This category encompasses the risks associated with the susceptibility of LLMs to adversarial attacks, where the models may fail to perform adequately under certain conditions.
LLMs are prone to generating biased outputs due to various factors such as the quality of training data, application context, and inferential capabilities. These biases can significantly impact the fairness and inclusivity of the outcomes produced by LLMs.
There is a risk of LLMs generating decisions that are opaque and difficult to comprehend for developers, deployers, and end-users, undermining trust and accountability in the decision-making process.
LLMs may inadvertently leak sensitive information or personal data, posing risks to user privacy and confidentiality.
This category highlights the risk of LLMs underperforming relative to their intended use-case, potentially leading to suboptimal outcomes and diminished utility.
The following section provides an in-depth of the risks associated with LLMs.
Hallucinations in Large Language Models (LLMs) refers to a phenomenon where the model generates text that is incorrect, nonsensical, or not real. This issue undermines the reliability of these models, casting doubt on the trustworthiness of their output. The prevalence of hallucinations poses a significant challenge in the advancement of LLMs, often stemming from deficiencies in the quality of training data and the interpretative capabilities of the models.
Dialogue history-based hallucinations occur when an LLM mixes up names or relations of entities. For example, if the user mentions that their friend John likes hiking, and later says their uncle Mark is coming to visit, the AI might incorrectly link John and Mark together as the same person due to faulty recall. Furthermore, during a conversation, an LLM can create new incorrect inferences based on previous errors within the dialogue history, further distorting the context and content of a conversation in a snowball effect.
It is important to remember what causes hallucinations in LLMs. These mistakes often occur in dialogue because LLMs rely on pattern recognition and statistics. Without a grounding in common sense or factual knowledge, LLMS can get lost and generate hallucinations.
An abstractive summarisation system is a model commonly used in LLMs to generate summaries of textual information, often for the purposes of making a piece of text more coherent and comprehensible.
Despite their usefulness in condensing information, abstractive summarisation systems can be prone to errors or semantic transformations between the original and generated data, triggering a hallucination in an LLM’s output.
Again, this is because they lack true comprehension of the source text, instead relying on pattern recognition and statistics. They may, as a result, distort or even entirely fabricate details, inferring unsupported causal relationships or retrieving unrelated background knowledge.
This type of hallucination occurs when an LLM makes an erroneous inference from its source information and arrives at an incorrect answer to a user question. This can happen even when relevant source material is provided.
For example, if a user asks, "Which private research university is located in Chestnut Hill, Massachusetts - Boston College or Stanford University?" and context is provided stating Boston College is located in Chestnut Hill, an LLM may still incorrectly respond "Stanford University" due to its own prior knowledge about Stanford being a top private research university. Rather than accurately recalling the pre-existing source information, the model ignores the evidence and makes an unjustified inference based on its existing knowledge.
In the context of large language models, a general data generation hallucination refers to a situation where the model generates outputs that may appear plausible or coherent but are not supported by factual or reliable information. It is a type of error where the model fabricates details or makes assumptions that go beyond the input data it has been trained on. This can result in the generation of false or misleading information that may seem convincing to humans but lacks a proper factual basis. Unlike other types of hallucination, the root cause of a general data hallucination is an overextension beyond training data rather than an incorrect inference or lack of grounding. The mode essentially imagines new information that isn't warranted by its training data.
Concerted regulatory momentum to govern generative models, and specifically Large Language Models (LLMs) is accelerating across the world – and companies desirous of developing and deploying such models must proactively ensure they fulfil the increasing list of obligations.
Holistic AI takes a comprehensive, interdisciplinary approach to responsible AI. We combine technical expertise with ethical analyses to assess systems from multiple angles. Safeguard, Holistic AI's LLM Auditing product, acts as a robust solution to identify and address these issues through a multifaceted approach: