How Scaling VLMs Trades Cognitive Biases

Teaser figure

Illustration of sycophancy vs authority bias. Left: a VLM sees a pink cup but answers 'Orange' to please the questioner. Right: a model accepts the question's framing and says 'Orange' because it trusts the questioner. — **Figure 1.** An illustration of the cognitive biases in models. *Left:* a VLM exhibits sycophancy by favoring the questioner's options despite recognizing it is a pink cup. *Right:* a model demonstrates authority bias by accepting the question's framing, also yielding the wrong answer. Distinguishing between them requires a structured set of follow-up prompts (Figure 2).

What if the model trusts the question more than the truth? Beyond engineering errors and sycophancy, AIpsych exposes authority bias: when flawed answer choices are presented with confidence, VLMs often choose from them anyway, even when visual evidence says otherwise.

Abstract

Hallucination remains a persistent challenge in Vision-Language Models (VLMs). While usually attributed to technical constraints or sycophancy, these explanations often overlook how hallucinations may mirror human-like cognitive biases. We propose a psychological taxonomy for VLMs, categorizing biases such as sycophancy, logical inconsistency, and a newly identified behavior: authority bias.

To study these, we developed AIpsych, a scalable benchmark designed to reveal psychological tendencies in model responses. Our analysis of varying architectures shows that as model size increases, VLMs exhibit a greater tendency of sycophancy but reduced authority bias — suggesting that increased competence may come at the cost of response integrity.

Key Findings

Authority bias is real and distinct. A new failure mode where models defer to a user's misleading framing even when visual evidence contradicts it — grounded in Milgram's obedience studies.
Scaling trades one bias for another. As models grow, sycophancy increases while authority bias decreases. Authority bias ranges from 99.8% (InternVL3) to 3.4% (GPT).
Humans share these tendencies — but spot the trap. 81.3% of human subjects choose the "else" escape hatch when offered. Models almost never do.
Existing de-biasing techniques don't address this. Methods targeting demographic biases or language priors cannot resolve interpretive cognitive patterns.

Method — AIpsych

For every image, AIpsych asks four scaffolded sub-prompts. The model's path through this question tree reveals which psychological factor produced its hallucination. The first prompt embeds two flawed options; the rest disambiguate why the model fell for them.

Sample image with question set and classification flow. The classification logic determines whether the model exhibits Type II Sycophancy, Logical Inconsistency, Type I Sycophancy, or Authority Bias based on its yes/no answers to the four sub-prompts. — **Figure 2.** A sample image (left) with its question set from AIpsych, and an illustration of the classification flow (right). If a model selects the trap option for the first prompt, subsequent prompts are presented to probe its psychological behaviour. Colored phrases highlight representative prompt elements.

The four diagnostic categories

Combining the model's yes/no replies to (i) the trap and (ii) the mistake-check yields a clean 2×2:

Trap?	Mistake noticed?	Behaviour	Interpretation
No	No	Authority Bias	Did not identify the trap; truly believes the prompt's framing.
No	Yes	Type I Sycophancy	Initially denies the mistake; flips when told not to be sycophantic.
Yes	Yes	Type II Sycophancy	Acknowledges the flaw, complies with it anyway. Weak indicator.
Yes	No	Logical Inconsistency	Identifies a mistake then takes it back under instruction. Self-conflict.

Benchmark statistics

AIpsych contains 2,000 images from the COCO 2014 validation set and 1,000 images from Visual Genome. Each image carries 5 sets of prompts, each with 4 sub-prompts — yielding 60,000 questions total. We evaluate 35 model variants across 10 VLM families.

3,000Images

60,000Questions

35Model variants

10VLM families

120Human survey

Pie chart showing the distribution of subject categories in the AIpsych Benchmark: Human 25%, Vehicles & Transport 18%, Home furniture 17%, Electronics & Appliances 13%, Food 13%, Flora & fauna 7%, Other 7%. — **Figure 3.** Distribution of subject categories in the AIpsych benchmark.

Results — scaling trends

We tested three hypotheses: (1) larger models are more sycophantic; (2) smaller models are more vulnerable to authority bias; (3) increasing size reduces logical inconsistencies. The results across 35 variants confirm (1) and (2) clearly; (3) is counter-intuitively mixed.

Six plots showing experimental trends across model scales for different VLM families. (a) Sycophancy I+II generally increases with model size. (b) Authority Bias decreases with model size. (c) Trap Spotting increases with model size. (d) Logical Inconsistency increases for half the models. (e) Else Trigger generally rises with size. (f) Reliability Score (ReS) increases with size. — **Figure 4.** Experimental trends across model scales. Plots *(a)* and *(f)* suggest increasing sycophancy and reliability score with size; plot *(b)* shows declining authority bias.

Highlights

Sycophancy increases with scale

Larger models often acknowledge a mistake in the prompt yet still provide a trap-based answer — alignment training may over-optimise for user preference.

Authority bias decreases with scale

From 99.8% in InternVL3 to 3.4% in GPT-4o. Larger models are more robust to misleading user framings.

Reliability score rises overall

Gemma 4, GPT-4o and Qwen2.5-VL-72B lead. Ovis 2 climbs steadily; LLaVA actually gets worse as it scales due to rising authority bias.

The "else" escape hatch

Most models reject the valid else option in favour of trap options — proof that they are not just attending to context but specifically deferring to the user's flawed framing.

Mechanism: sees and ignores vs. perceptual capture

Attention maps cleanly separate the two failure modes. In sycophancy cases the model's visual backbone correctly attends to the target object — the hallucination is a downstream choice. In authority bias cases the model's gaze is captured by the coordinates suggested in the text, never landing on the real object at all.

Three side-by-side images showing attention maps. Left: Sycophancy — the model correctly perceives the basket in the bottom-left. Middle: the input image. Right: Authority Bias — the model's attention focuses on the manipulated coordinates suggested by the prompt, in the top-right. — **Figure 5.** Visualizing cognitive bias via attention map divergence. *Left (Sycophancy):* the attention map confirms the model correctly perceives the basket, prioritizing user-alignment over facts. *Right (Authority Bias):* the model's gaze focuses on the manipulated coordinates in the prompt, demonstrating a top-down failure where textual authority overrides objective visual evidence.

Human study

We surveyed 120 undergraduate and graduate subjects (1,440 responses), asking them the identical prompts under instructions matching the model evaluation. Humans display the same psychological tendencies — but at much lower rates, and they overwhelmingly choose the "else" option when offered.

Behaviour	Humans	GPT-4o
Type I Sycophancy	0.3%	7.7%
Type II Sycophancy	30.6%	66.4%
Authority Bias	12.8%	4.1%
Logical Inconsistency	1.1%	10.6%
Trap Spotting	54.7%	11.3%
"Else" Trigger	81.3%	30.9%
Full Response Rate	21.0%	86.5%

Table 1. Percentage of responses exhibiting each behaviour or response-format metric. Humans recognise the trap and use the "else" escape hatch far more often than the strongest proprietary model. While models have learned some human-like behaviours, alignment with humans remains imperfect.