Jan 24, 2026

Why model hallucinate?

OpenAI’s Explanation

Because the generation process itself

Probabilistic nature: Always generating most likely next word, not most accurate
Pattern matching vs reasoning: Models excel at statistical patterns but lack true understanding of facts

Because the training and evaluation setup

Benchmark gaming: Models optimized for test performance rather than truthfulness
Confidence rewarded: Systems prefer confident wrong answers over uncertain admissions

Because the data and knowledge limitations

Knowledge cutoff: Models can’t access real-time information
No ground truth labels: Unlike classification tasks with clear right/wrong answers, language models only see text sequences without factual validation
No fact-checking mechanism: No built-in process to verify generated content

My Thoughts

Cross-lingual bias

There is a bias in training data between English and other languages (This bias is hard to recognize and remove because it requires language experts to validate, which is time-consuming, so I think this process is not widely applied when creating training data for models), meaning even when we mention the same concept, the model may generate slightly different responses in different languages. Sometimes, just changing some characters, the order of words, or punctuation can lead to different meanings.
The reason for this can be the dependencies of meaning. When you explain a word in English, you actually use other English words to explain it. These words were invented in countries like the US, America, etc., and have cultural and historical backgrounds. So when you translate them to other languages, the meaning may be different.
So these biases existed but rarely people know about them. They lead to inconsistent knowledge representation, which can cause hallucinations.

This thinking is consolidated from Benchmarking Concept-Spilling Across Languages in LLMs.

Post-training effects create personality traits:

At the pretraining stage, we create an “untamed monster” by making it learn statistical patterns from internet data (trillions of tokens). Data includes clickbait, misinformation,etc. Models learn to complete text without understanding truth vs falsehood.
At the supervised fine-tuning (SFT) stage, models are trained on higher-quality data (StackOverflow, Quora, human annotations) to follow instructions and be “socially acceptable”. Creates the “naive expert” - sounds authoritative but may lack underlying knowledge.
At the RLHF stage, models are polished to be “customer-appropriate” using reward models, but this introduces bias where models prioritize agreement to maximize reward. Models learn to be helpful, confident, and definitive rather than uncertain.

These thoughts are consolidated from AI Sycophancy: How Users Flag and Respond. (They published this paper on arXiv on Jan 20, 2026. I had similar thoughts independently and wrote them down, then wondered if anyone else in the world was thinking like me - that’s when I found this paper)

Hallucination Accumulation in Long-Term Task:

Come from the model itself

Sometimes, models are too confident about their answers, especially when performing long-term tasks without human-in-the-loop (HITL) feedback or using tools to verify intermediate results, even when they have the ability to do so. For models with reasoning ability, it’s hard to track the reasoning process (They think too much, sometimes overthinking and too fast), so it’s difficult to know if the model is reasoning correctly or not. Especially in long-term tasks, models can fall into incorrect hypotheses or misconceptions and generate many hallucinations.

There are solutions to reduce the hallucination coming from the model itself, for example GLM provides thinking abilities to models by adjusting the thinking behavior of models via special tokens, training process and engineering techniques behind:

Interleaved thinking: Allowing GLM to think between tool calls and after receiving tool results. This enables more complex, step-by-step reasoning: interpreting each tool output before deciding what to do next, chaining multiple tool calls with reasoning steps, and making finer-grained decisions based on intermediate results.
Preserved thinking: The model can retain reasoning content from previous assistant turns in the context. This helps preserve reasoning continuity and conversation integrity, improves model performance, and increases cache hit rates—saving tokens in real tasks.
Turn-level thinking: Is a capability that lets you control reasoning computation on a per-turn basis: within the same session, each request can independently choose to enable or disable thinking

Come from the collaboration between user and model

The hallucination can also come from the collaboration between user and model:

User provides unsure/misinformation (This is an agnostic behavior of users - if they know the information is false, they won’t provide it, but they don’t know that they don’t know) → “Naive well-kind expert” accepts it as truth.
Model incorporates this false information into its context/reasoning.
Model builds upon false premises → generates more hallucinations based on bad foundation.
User sees confident “expert” responses → trusts the model more.
Cycle repeats → accumulated errors compound over time.

These thoughts are consolidated from If You Want Coherence, Orchestrate a Team of Rivals: Multi-Agent Models of Organizational Intelligence. (Again, i had similar thoughts independently, seem like when i have some thoughts, there are some one else in the world already public it on arXiv 😭)

Tips

So the tips to reduce hallucination are:

Using fact-checking mechanisms
Challenging the model, adding rival perspectives
Or simply just asking twice Prompt Repetition Improves Non-Reasoning LLMs