In June 2023, a New York lawyer named Steven Schwartz filed a legal brief citing six court cases to support his client’s personal injury claim. There was one problem: none of the cases existed. Schwartz had used ChatGPT to research case law, and the AI had generated completely fabricated case names, citations, and judicial opinions — all written with absolute confidence and perfect legal formatting. The judge sanctioned Schwartz and his firm $5,000, and the case became the most famous example of AI hallucination in professional practice.
Schwartz’s mistake was not using AI. It was trusting AI output without verification. And that mistake is being replicated thousands of times daily in offices, newsrooms, hospitals, and government agencies worldwide.
AI hallucination is not an occasional malfunction. It is a structural feature of how large language models (LLMs) work. Every organisation using AI needs to understand why it happens, how to detect it, and how to build systems that prevent hallucinated content from causing real harm.
À retenir
- AI hallucinations occur because language models predict probable text, not factual text — they have no concept of truth
- Current LLMs hallucinate at rates of 3-15% depending on the model, task, and domain
- High-risk domains — legal, medical, financial, regulatory — require mandatory human verification of all AI outputs
- Detection methods exist but are imperfect: the most reliable approach combines AI-assisted checking with human review
Why AI hallucinations happen
To understand hallucinations, you need to understand what LLMs actually do. Despite their impressive outputs, large language models do not “know” anything. They do not access databases of facts. They do not reason about truth. They predict the next token (word or sub-word) in a sequence based on statistical patterns learned during training.
When you ask ChatGPT “What is the capital of France?”, it does not look up the answer. It generates “Paris” because, in its training data, the token “Paris” overwhelmingly follows the pattern “capital of France.” This works brilliantly for well-established facts. But when the model encounters a question where the “correct” next token is ambiguous or absent from training data, it generates the most statistically plausible sequence — which may be entirely fictional.
The confidence problem. LLMs generate all output with the same linguistic confidence. A correct fact and a fabricated one are presented in identical style, with identical certainty. The model has no internal mechanism to flag uncertainty or distinguish what it “knows” from what it is inventing. This is fundamentally different from a search engine, which can return “no results found.” An LLM always has an answer.
The training data problem. LLMs are trained on vast internet corpora that contain errors, contradictions, outdated information, and deliberate misinformation. The model learns patterns from all of this data without distinguishing reliable from unreliable sources. It cannot tell you whether a claim comes from a peer-reviewed journal or a Reddit comment.
The recency problem. LLMs have training data cutoff dates. They cannot access information created after their training, but they will generate plausible-sounding answers about recent events based on patterns from older data. Ask about a law passed last month, and the model may confidently describe a law that does not exist.
3-15%
hallucination rate across major LLMs depending on task complexity and domain
Source : Vectara Hughes Hallucination Leaderboard, 2025
Famous examples and their consequences
Legal hallucinations
The Schwartz case was not an isolated incident. A 2024 Stanford study examined AI-generated legal research across multiple LLMs and found that GPT-4 hallucinated case citations in 6.3% of responses, while GPT-3.5 hallucinated in 12.8%. When specifically asked to provide supporting case law, hallucination rates rose to 33% for GPT-3.5 and 15.6% for GPT-4 (Stanford CodeX, 2024).
In Canada, a lawyer was fined after citing a non-existent case generated by AI in a family law matter. In the UK, a litigant in person submitted AI-generated case law to a tribunal, causing delays when the fabricated precedents had to be investigated.
Medical hallucinations
AI chatbots providing health information have generated dangerous advice. A 2024 JAMA study tested ChatGPT-4 on 284 medical questions and found that while 78% of answers were accurate, 4.2% contained clinically dangerous errors — including incorrect drug dosages, fabricated drug interactions, and invented medical conditions (JAMA Internal Medicine, 2024).
In a particularly concerning case, a mental health chatbot operated by a US health tech company provided advice that contradicted established clinical guidelines for a user expressing suicidal ideation. The company suspended the service pending review.
Financial hallucinations
Bloomberg tested GPT-4’s ability to answer questions about financial regulations and found that the model fabricated regulatory citations in 8.7% of responses. In one case, the model cited a specific clause of the Dodd-Frank Act that does not exist, describing its requirements in convincing detail.
For organisations subject to regulatory compliance, AI-generated financial or regulatory information that turns out to be fabricated can lead to violations, fines, and reputational damage. Our AI risk assessment guide covers how to evaluate these exposures.
Academic and journalistic hallucinations
Science journal retractions linked to AI-generated fabricated references increased 300% between 2023 and 2025 (Retraction Watch). CNET published AI-generated articles containing factual errors about compound interest calculations. Sports Illustrated was found to have published articles attributed to AI-generated fake authors with AI-generated fake headshots.
AI hallucinations are not getting solved — they are getting subtler. As models improve, they hallucinate less often but more convincingly. A hallucination from GPT-3.5 was often obviously wrong. A hallucination from a 2026-era model may be sophisticated enough to fool domain experts without careful verification.
Detection methods
Detecting AI hallucinations is an active area of research, but no method is foolproof.
Automated approaches
Retrieval-Augmented Generation (RAG). Rather than relying solely on the model’s training data, RAG systems retrieve relevant documents from a trusted knowledge base and use them to ground the AI’s response. This significantly reduces hallucination rates — Microsoft reports that RAG reduces factual errors in enterprise Copilot by approximately 40-60% — but does not eliminate them entirely.
Self-consistency checking. Ask the same question multiple times with different phrasings. If the model gives inconsistent answers, at least one is likely hallucinated. This is computationally expensive but effective for high-stakes applications.
Citation verification. AI tools that automatically verify cited sources — checking whether URLs exist, whether quoted passages appear in the cited document, whether cited statistics match their sources. Google’s DeepMind developed a fact-checking tool that achieves 90% accuracy in identifying unsupported claims in AI-generated text.
Confidence scoring. Some systems analyse the model’s token-level probabilities to estimate confidence. Low-confidence segments are flagged for human review. This approach is promising but still experimental.
Human approaches
Domain expert review. The most reliable detection method remains having qualified humans check AI output against primary sources. For legal work, a lawyer verifies case citations. For medical content, a clinician reviews clinical claims. For financial reports, an analyst checks the numbers.
Structured verification protocols. Organisations should establish clear protocols: which AI outputs require verification, who is responsible for checking, what sources count as verification, and how verification is documented. This is particularly important for regulated industries where errors have legal consequences.
Red teaming. Systematically testing AI systems by deliberately trying to elicit hallucinations helps organisations understand where their specific deployments are most vulnerable. This should be part of any AI risk assessment process.
40-60%
reduction in factual errors when using Retrieval-Augmented Generation compared to standard LLM responses
Source : Microsoft Research, 2025
Enterprise risk management
For organisations, AI hallucinations create risks across multiple dimensions.
Legal liability. Publishing or acting on hallucinated information can create legal exposure — defamation if false claims are made about individuals or companies, regulatory violations if fabricated compliance information is relied upon, professional negligence if hallucinated advice is given to clients.
Reputational damage. Trust is hard to build and easy to destroy. A single high-profile hallucination incident — fabricated statistics in a public report, incorrect information given to customers, false claims in marketing content — can cause lasting reputational harm.
Regulatory compliance. Under the EU AI Act, organisations deploying AI systems have obligations around transparency and accuracy. If an AI system used in a high-risk context produces hallucinated outputs that lead to harmful decisions, the deploying organisation bears responsibility. The Act requires human oversight specifically to catch failures like hallucinations.
Operational errors. Hallucinated data in internal reports, analyses, or decision-support tools can lead to poor business decisions. An AI that fabricates market data, invents competitor information, or generates incorrect financial projections can lead organisations astray.
The most dangerous hallucinations are not the obviously wrong ones — those are easy to spot. The dangerous ones are plausible, detailed, and embedded in otherwise accurate output. They survive casual review and are only caught by careful, systematic verification against primary sources.
Building an anti-hallucination strategy
1. Establish a verification culture. Make it organisational policy that AI output is never published, sent to clients, or used for decisions without human verification. This is not about distrusting AI — it’s about using it responsibly.
2. Define risk tiers. Not all AI outputs carry equal risk. An AI draft of an internal meeting summary is lower risk than an AI-generated regulatory compliance report. Define tiers and match verification effort to risk level.
3. Use RAG and grounding. Where possible, deploy AI systems that are grounded in your organisation’s verified knowledge base rather than relying solely on the model’s training data.
4. Train your people. Every employee who uses AI needs to understand hallucinations — what they are, why they happen, and how to detect them. This is a core component of AI literacy training and an explicit requirement under the EU AI Act’s Article 4. Our AI competency framework includes hallucination detection as a foundational skill.
5. Document your approach. Regulators, auditors, and clients will increasingly ask what controls you have in place to prevent AI hallucinations from causing harm. Document your policies, procedures, and verification protocols.
6. Monitor and improve. Track hallucination incidents, analyse patterns, and continuously improve your detection and prevention measures. What types of queries produce the most hallucinations? Which domains are most vulnerable? Use this data to focus your efforts.
Test your hallucination detection skills
Protect your organisation with Brain
AI hallucinations are not going away — they are an inherent feature of how language models work. The only reliable protection is a workforce that understands the risk and knows how to verify AI outputs systematically.
Brain delivers practical training on hallucination detection and AI safety. Interactive modules where employees practise identifying hallucinated content in realistic business scenarios. Role-specific training for legal, finance, HR, and other high-risk functions. Compliance documentation for EU AI Act Article 4.
Related articles
What Is Shadow AI? 5 Risks + How to Manage It (2026)
Shadow AI is unauthorised AI use by employees. Discover why it's dangerous and get a practical framework to manage it effectively.
AI Bias: Amazon, Apple Card + 5 Prevention Steps
AI bias with real cases — Amazon hiring tool, Apple Card, UK welfare. Types of algorithmic bias, detection methods and prevention checklist.
AI Risk Assessment: Free Template + Scoring Matrix (2026)
Conduct an AI risk assessment with our free template. Scoring matrix, 4 risk categories, and step-by-step methodology aligned with EU AI Act.