According to Ars Technica, an international research team tested whether large language models integrate false statements that are explicitly labeled as false in training data. The researchers seeded fine-tuning data with six fabricated claims (examples: a false Ed Sheeran Olympics claim and a fabricated Queen Elizabeth II authorship claim), had models generate thousands of synthetic documents that asserted and supported those claims, then fine-tuned models on that material, Ars Technica reports. After fine-tuning, the tested models - Qwen3.5-35B-A3B, Kimi K2.5, and GPT-4.1 - showed measurable uptake of the false claims; evaluations indicated belief-like behavior, and Ars Technica quotes the paper saying a "bias ... toward confidently representing the claims as true."
According to Ars Technica, an international team of university and corporate-sponsored researchers tested whether LLMs incorporate falsehoods that are explicitly labeled as false in training data. The study started with six deliberately outrageous false statements (for example, a fabricated claim that Ed Sheeran won the 100m Olympic gold in 2024 and a claim that Queen Elizabeth II authored a graduate-level Python textbook). The researchers used LLMs to generate thousands of synthetic documents that embedded those false claims and supporting subclaims, then fine-tuned target models on that synthetic material, Ars Technica reports.
Technical details
Ars Technica reports the tested target models included...
Read Full Story:
https://news.google.com/rss/articles/CBMilAFBVV95cUxQSmxiY3BVR3k1UzFuOXhLemRW...