As China’s DeepSeek grabs headlines around the world for its disruptively low-cost AI, it is only natural that its models are coming under intense scrutiny — and some researchers are not liking what they see.
On Wednesday, the information-reliability organization NewsGuard said it had audited DeepSeek’s chatbot and found that it provided inaccurate answers or non-answers 83% of the time when asked about news-related subjects. When presented with demonstrably false claims, it debunked them just 17% of the time, NewsGuard found.
According to NewsGuard, the 83% fail rate places DeepSeek’s R1 model in 10th place out of 11 chatbots it has tested, the rest of which are Western services like OpenAI’s ChatGPT-4, Anthropic’s Claude, and Mistral’s Le Chat. (NewsGuard compares chatbots each month in its AI Misinformation Monitor program, but it usually does not name which chatbots rank in which place, as it says it views the problem as systemic across the industry — it only publicly assigns a score to a named chatbot when adding it to the comparison for the first time, as it has now done with DeepSeek.)
NewsGuard identified a few likely reasons why DeepSeek fails so badly when it comes to reliability. The chatbot claims to have not been trained on any information after October 2023, which scans with its inability to reference recent events. Also, it seems to be easy to trick DeepSeek into repeating false claims, potentially at scale.
But this audit of DeepSeek also reinforced how...
Read Full Story:
https://news.google.com/rss/articles/CBMimgFBVV95cUxQSEJDRk5VYWhldmljMHUydDUx...