ChatGPT can sound confident, clear, and convincing. But a new study suggests that confidence may hide a deeper problem.
Researchers found that when the same question is asked multiple times, ChatGPT can give different answers – even when nothing in the prompt changes. In some cases, it flips between “true” and “false” on the exact same claim.
That kind of inconsistency raises a bigger concern. If an answer can change without a reason, how much can we trust it when the stakes are higher?
Testing ChatGPT accuracy
Across hundreds of hypotheses drawn from published scientific research papers, the system was repeatedly asked to decide whether each one was true or false.
By running the exact same question ten times, Mesut Cicek at Washington State University (WSU) showed that identical prompts could return opposite answers.
Some claims flipped back and forth between true and false across repeated runs, even though nothing in the input changed.
Such reversals expose a core limitation in how the system evaluates claims, setting up the need to examine where and why those errors occur.
Where ChatGPT loses accuracy
Errors were most pronounced with unsupported hypotheses, revealing a persistent bias toward agreement that the model did not overcome.
In 2025, ChatGPT correctly identified those false claims just 16.4 percent of the time – far below its headline accuracy.
That pattern suggests the system often defaults to “yes,” because matching familiar language is easier than spotting...
Read Full Story:
https://news.google.com/rss/articles/CBMiqwFBVV95cUxObEtqUHVTaHdqZFBpZ29BaE1P...