Abstract
Hallucination remains a persistent challenge in Vision-Language Models (VLMs). While usually attributed to technical constraints or sycophancy, these explanations often overlook how hallucinations may mirror human-like cognitive biases. We propose a psychological taxonomy for VLMs, categorizing biases such as sycophancy, logical inconsistency, and a newly identified behavior: authority bias.
To study these, we developed AIpsych, a scalable benchmark designed to reveal psychological tendencies in model responses. Our analysis of varying architectures shows that as model size increases, VLMs exhibit a greater tendency of sycophancy but reduced authority bias — suggesting that increased competence may come at the cost of response integrity.
Key Findings
- Authority bias is real and distinct. A new failure mode where models defer to a user's misleading framing even when visual evidence contradicts it — grounded in Milgram's obedience studies.
- Scaling trades one bias for another. As models grow, sycophancy increases while authority bias decreases. Authority bias ranges from 99.8% (InternVL3) to 3.4% (GPT).
- Humans share these tendencies — but spot the trap. 81.3% of human subjects choose the "else" escape hatch when offered. Models almost never do.
- Existing de-biasing techniques don't address this. Methods targeting demographic biases or language priors cannot resolve interpretive cognitive patterns.
Method — AIpsych
For every image, AIpsych asks four scaffolded sub-prompts. The model's path through this question tree reveals which psychological factor produced its hallucination. The first prompt embeds two flawed options; the rest disambiguate why the model fell for them.
The four diagnostic categories
Combining the model's yes/no replies to (i) the trap and (ii) the mistake-check yields a clean 2×2:
| Trap? | Mistake noticed? | Behaviour | Interpretation |
|---|---|---|---|
| No | No | Authority Bias | Did not identify the trap; truly believes the prompt's framing. |
| No | Yes | Type I Sycophancy | Initially denies the mistake; flips when told not to be sycophantic. |
| Yes | Yes | Type II Sycophancy | Acknowledges the flaw, complies with it anyway. Weak indicator. |
| Yes | No | Logical Inconsistency | Identifies a mistake then takes it back under instruction. Self-conflict. |
Benchmark statistics
AIpsych contains 2,000 images from the COCO 2014 validation set and 1,000 images from Visual Genome. Each image carries 5 sets of prompts, each with 4 sub-prompts — yielding 60,000 questions total. We evaluate 35 model variants across 10 VLM families.
Results — scaling trends
We tested three hypotheses: (1) larger models are more sycophantic; (2) smaller models are more vulnerable to authority bias; (3) increasing size reduces logical inconsistencies. The results across 35 variants confirm (1) and (2) clearly; (3) is counter-intuitively mixed.
Highlights
Sycophancy increases with scale
Larger models often acknowledge a mistake in the prompt yet still provide a trap-based answer — alignment training may over-optimise for user preference.
Authority bias decreases with scale
From 99.8% in InternVL3 to 3.4% in GPT-4o. Larger models are more robust to misleading user framings.
Reliability score rises overall
Gemma 4, GPT-4o and Qwen2.5-VL-72B lead. Ovis 2 climbs steadily; LLaVA actually gets worse as it scales due to rising authority bias.
The "else" escape hatch
Most models reject the valid else option in favour of trap options — proof that they are not just attending to context but specifically deferring to the user's flawed framing.
Mechanism: sees and ignores vs. perceptual capture
Attention maps cleanly separate the two failure modes. In sycophancy cases the model's visual backbone correctly attends to the target object — the hallucination is a downstream choice. In authority bias cases the model's gaze is captured by the coordinates suggested in the text, never landing on the real object at all.
Human study
We surveyed 120 undergraduate and graduate subjects (1,440 responses), asking them the identical prompts under instructions matching the model evaluation. Humans display the same psychological tendencies — but at much lower rates, and they overwhelmingly choose the "else" option when offered.
| Behaviour | Humans | GPT-4o |
|---|---|---|
| Type I Sycophancy | 0.3% | 7.7% |
| Type II Sycophancy | 30.6% | 66.4% |
| Authority Bias | 12.8% | 4.1% |
| Logical Inconsistency | 1.1% | 10.6% |
| Trap Spotting | 54.7% | 11.3% |
| "Else" Trigger | 81.3% | 30.9% |
| Full Response Rate | 21.0% | 86.5% |
Table 1. Percentage of responses exhibiting each behaviour or response-format metric. Humans recognise the trap and use the "else" escape hatch far more often than the strongest proprietary model. While models have learned some human-like behaviours, alignment with humans remains imperfect.