Some answers looked right. Ask the same question again, and the answer might flip.
That was the unsettling pattern Washington State University professor Mesut Cicek and his colleagues found when they tested ChatGPT against 719 hypotheses pulled from business research papers. The team repeatedly fed the AI statements from scientific articles and asked a simple question: did the research support the hypothesis, yes or no?
The system often sounded confident. It was not always dependable.
In mid-2024, the free version of ChatGPT-3.5 answered correctly 76.5% of the time. When the researchers repeated the experiment in mid-2025 using GPT-5 mini, accuracy rose to 80%. That improvement was statistically significant, but small. Once the team adjusted for the fact that a true-or-false guess has a 50% chance of being right, the model’s effective performance dropped sharply.
Cicek said that gap matters because a polished answer can create more trust than it deserves.
“We're not just talking about accuracy, we're talking about inconsistency, because if you ask the same question again and again, you come up with different answers,” said Cicek, an associate professor in the Department of Marketing and International Business in WSU’s Carson College of Business and lead author of the paper.
The findings were published in the Rutgers Business Review.
When the same prompt gets different answers
The researchers extracted 719 hypothesis statements from 127 open-access articles published since...
Read Full Story:
https://news.google.com/rss/articles/CBMimgFBVV95cUxPdjVDWnJZTFBqbUx2RlJ0S0h6...