How Anthropic’s AI Model Thinks, Lies, and Catches itself Making a Mistake - Analytics India Magazine

by Ankush Das

Thu, Apr 3, 2025 05:31 ET

in False Claims

AI isn’t perfect. It can hallucinate and sometimes be inaccurate—but can it straight-up fake a story just to match your flow? Yes, it turns out that AI can lie to you.

Anthropic researchers recently set out to uncover the secrets of LLM and much more. They shared their findings in a blog post that read, “From a reliability perspective, the problem is that Claude’s ‘fake’ reasoning can be very convincing.”

The study aimed to find out how Claude 3.5 Haiku thinks by using a ‘circuit tracing’ technique. This is a method to uncover how language models produce outputs by constructing graphs that show the flow of information through interpretable components within the model.

Paras Chopra, founder of Lossfunk, took to X, calling one of their research papers “a beautiful paper by Anthropic”.

However, the question is: Can the study help us understand AI models better?

AI Can Be Unfaithful

In the research paper titled ‘On the Biology of a Large Language Model’, Anthropic researchers mentioned that the chain-of-thought reasoning (CoT) is not always faithful, a claim also backed by other research papers. The paper shared two examples where Claude 3.5 Haiku indulged in unfaithful chains of thought.

It labelled the examples as the model exhibiting “bullshitting”, which is when someone deliberately makes false claims about what is true, referencing Harry G Frankfurt’s bestseller, and “motivated reasoning”, which refers to the model trying to align to the user’s input. For motivated...

Read Full Story: https://news.google.com/rss/articles/CBMitAFBVV95cUxPWTFuMUJfdUdVZUFiNGJWYnBK...

False Claims

DiCaprio Funds Brazilian Environmentalist After False Forest Fire Claims By Bolsonaro - NDTV

by Associated Press

1 hour ago

Being falsely accused of setting fire in the Amazon with funds from an Oscar-winning actor eventually became a blessing in disguise for Caetano Scannavino, the founder of a nonprofit organisation ...

Law Blog

How Anthropic’s AI Model Thinks, Lies, and Catches itself Making a Mistake - Analytics India Magazine

AI Can Be Unfaithful

False Claims

DiCaprio Funds Brazilian Environmentalist After False Forest Fire Claims By Bolsonaro - NDTV

Eight Named As Suspects Over False Claims About Jokowi’s Degree - bernama

Bystander Attacks Innocent Black Man After Karen’s False Theft Accusation Goes Viral | WATCH - EURweb

Dear Abby: Can we mend family rift after niece’s false claims? - AL.com