A Microsoft Research and Salesforce study analyzed over 200,000 dialogues with models like GPT-4.1. The main finding is that these systems see their capacity reduced in prolonged and natural exchanges. Accuracy can drop from 90% on isolated questions to nearly 65%, showing a blunting behavior.
The Problem of Premature Generation and Fixation on Initial Responses ??
The study attributes the performance decline to a mechanism of premature generation. The model forms an internal response in the first interactions and sticks to it, even if it is incorrect, instead of reevaluating the full context. This fixation, added to the tendency to produce texts 300% longer, increases the probability of hallucinations and factual errors in complex dialogues.
When AI Decides It Already Knows What You're Going to Say (and Gets It Wrong) ??
It's like talking to someone who, after hearing the first word of your question, nods and starts giving a twenty-minute response. It doesn't matter that you then detail that you meant something else; the bot has already plotted its narrative plan and is going to follow it to the end, adding flourishes and invented data along the way. Natural conversation is not its strong suit, but hey, it offers monologues with enviable conviction.