AI isn’t perfect. It can hallucinate and sometimes be inaccurate—but can it straight-up fake a story just to match your flow? Yes, it turns out that AI can lie to you.
Anthropic researchers recently set out to uncover the secrets of LLM and much more. They shared their findings in a blog post that read, “From a reliability perspective, the problem is that Claude’s ‘fake’ reasoning can be very convincing.”
The study aimed to find out how Claude 3.5 Haiku thinks by using a ‘circuit tracing’ technique. This is a method to uncover how language models produce outputs by constructing graphs that show the flow of information through interpretable components within the model.
Paras Chopra, founder of Lossfunk, took to X, calling one of their research papers “a beautiful paper by Anthropic”.
However, the question is: Can the study help us understand AI models better?
AI Can Be Unfaithful
In the research paper titled ‘On the Biology of a Large Language Model’, Anthropic researchers mentioned that the chain-of-thought reasoning (CoT) is not always faithful, a claim also backed by other research papers. The paper shared two examples where Claude 3.5 Haiku indulged in unfaithful chains of thought.
It labelled the examples as the model exhibiting “bullshitting”, which is when someone deliberately makes false claims about what is true, referencing Harry G Frankfurt’s bestseller, and “motivated reasoning”, which refers to the model trying to align to the user’s input. For motivated...
Read Full Story:
https://news.google.com/rss/articles/CBMitAFBVV95cUxPWTFuMUJfdUdVZUFiNGJWYnBK...