DeFi Daily News
Tuesday, June 23, 2026
Advertisement
  • Cryptocurrency
    • Bitcoin
    • Ethereum
    • Altcoins
    • DeFi-IRA
  • DeFi
    • NFT
    • Metaverse
    • Web 3
  • Finance
    • Business Finance
    • Personal Finance
  • Markets
    • Crypto Market
    • Stock Market
    • Analysis
  • Other News
    • World & US
    • Politics
    • Entertainment
    • Tech
    • Sports
    • Health
  • Videos
No Result
View All Result
DeFi Daily News
  • Cryptocurrency
    • Bitcoin
    • Ethereum
    • Altcoins
    • DeFi-IRA
  • DeFi
    • NFT
    • Metaverse
    • Web 3
  • Finance
    • Business Finance
    • Personal Finance
  • Markets
    • Crypto Market
    • Stock Market
    • Analysis
  • Other News
    • World & US
    • Politics
    • Entertainment
    • Tech
    • Sports
    • Health
  • Videos
No Result
View All Result
DeFi Daily News
No Result
View All Result
Home DeFi Web 3

rewrite this title AI Models Can’t Agree on Basic Facts Most of the Time, Study Shows – Decrypt

Jose Antonio Lanz by Jose Antonio Lanz
May 29, 2026
in Web 3
0 0
0
rewrite this title AI Models Can’t Agree on Basic Facts Most of the Time, Study Shows – Decrypt
0
SHARES
0
VIEWS
Share on FacebookShare on TwitterShare on Telegram
Listen to this article


rewrite this content using a minimum of 1000 words and keep HTML tags

In brief

Five frontier AI models disagreed on 67% of 1,000 real-world fact-check claims.
Unanimous agreement happened on only 328 claims.
At 0.639 Krippendorff’s alpha, the models fall below the 0.8 reliability threshold.

Ask five of the world’s most advanced AI systems whether a statement is true, and two-thirds of the time, at least one will give you a different answer. That’s the finding of a new study published this month by researcher Kosta Jordanov at Lenz Research.

The study gave GPT-5.4, Claude Opus 4.7, Gemini 3 Pro, Gemini 3 Pro with Search, and Sonar Pro the same 1,000 real-world fact-check claims submitted by actual users. The models had to pick one of four labels: true, mostly true, misleading, or false.

On 672 out of 1,000 claims, at least one model broke from the majority. In 34% of cases, the disagreement was severe: one model called a claim true while another called it false.

“These aren’t benchmark items with public answer keys—they’re claims real users submitted for verification to a fact-checking platform,” the study reads. “Only one verdict bucket can be correct per claim, so any disagreement among the panel means at least one model’s verdict is label-inconsistent under this 4-bucket rubric.”

Previous studies on AI hallucination have shown that chatbots invent facts. That’s one problem. This is a different one. The models aren’t necessarily making things up, they just can’t agree on basic factual judgments about the same material.



The research used a setup that makes it harder for the AI companies to explain away. Instead of pulling claims from standard test sets—the kind that often leak into training data—the researchers used claims submitted by real people to Lenz’s fact-checking platform. “Most of these claims are unlikely to appear in any training corpus with a gold label attached—there’s no canonical answer key to pattern-match against, no benchmark leaderboard to anchor to,” the paper notes.

The statistical measure of agreement, called Krippendorff’s alpha, came in at 0.639 on a scale where 1.0 means perfect agreement and 0 means random chance. The study says this indicates “nontrivial but limited agreement.” “The models’ verdicts are structured rather than random, but not consistent enough to treat the panel as a single interchangeable judge,” researchers note. Researchers generally consider anything below 0.8 to be weak.

When all five models did agree—which happened on only 328 out of 1,000 claims—they almost never agreed that something was misleading or mostly true. Just four claims received a unanimous “misleading” verdict. Zero received unanimous “mostly true.”

The researchers provided example claims where the AI models showed the most divergence, including “The World Bank’s active portfolio in Nigeria stands an over $16.4 billion as of 2025.” ChatGPT 5.4 said it was “mostly true” while Gemini 3 Pro called it “false” and its sister model Gemini 3 Pro + Search rated it “misleading.”

In another example, the models were provided with the claim: “Donald Trump said that an attack on Iran was postponed at the request of Gulf Allies.” GPT-5.4 said it was false, Claude Opus 4.7 called it mostly true, Gemini 3 Pro said false, and Gemini 3 Pro + Search rated it true.

“The panel converges on definitive verdicts; the middle of the rubric is where it fractures,” the researchers found. Unanimity only happened at the extremes: either the claim was definitely true or definitely false.

This matters because people are increasingly turning to AI systems for fact-checking. If you paste a claim from a news article into ChatGPT, Claude, or Gemini, you might get three different answers. Which one do you trust?

AI companies love to tell you their models are getting more accurate. They publish benchmark scores showing steady improvement. But the Lenz study tested these models on the kind of jagged, ambiguous claims that real humans actually argue about—and found that the models argue too.

The paper is careful to point this out. “A majority of frontier models is not ground truth. The majority verdict is sometimes wrong; an individual dissenting model is sometimes right. We use the majority as a structural reference point for measuring disagreement, not as a stand-in for correctness.”

There’s a deeper problem buried in the numbers. When models disagree, at least one of them must be wrong—the study calls a model’s verdict “label-inconsistent under this 4-bucket rubric.” There’s no tie-breaker mechanism, no appeals court. Recent reporting on AI reliability has raised similar alarms.

On the 328 claims where all five models agreed, zero received a unanimous “mostly true.” The nuance bucket emptied out completely. If AI models can only find consensus at the extremes, can they be trusted as fact checkers at all?

Daily Debrief Newsletter

Start every day with the top news stories right now, plus original features, a podcast, videos and more.

and include conclusion section that’s entertaining to read. do not include the title. Add a hyperlink to this website http://defi-daily.com and label it “DeFi Daily News” for more trending news articles like this



Source link

Tags: agreebasicDecryptFactsModelsrewriteshowsStudyTimetitle
ShareTweetShare
Previous Post

rewrite this title 5 Things I’ve Learned in 5 Months of Selling Options – NerdWallet

Next Post

JPMorgan CEO Jamie Dimon: The market is exuberant and it’s not bad

Next Post
JPMorgan CEO Jamie Dimon: The market is exuberant and it’s not bad

JPMorgan CEO Jamie Dimon: The market is exuberant and it's not bad

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
  • Trending
  • Comments
  • Latest
How one terrible trip inspired a tech IPO: Navan Co-Founder

How one terrible trip inspired a tech IPO: Navan Co-Founder

June 15, 2026
rewrite this title ‘My Neighbor Alice’ Launches 100K ALICE Grant Program To Support Web3 Development And Ecosystem Growth

rewrite this title ‘My Neighbor Alice’ Launches 100K ALICE Grant Program To Support Web3 Development And Ecosystem Growth

April 21, 2025
rewrite this title AO Offshores Bulk of Customer Service Jobs to South Africa in Savings Drive – UC Today

rewrite this title AO Offshores Bulk of Customer Service Jobs to South Africa in Savings Drive – UC Today

June 19, 2026
Baylor QB Sawyer Robertson | Gruden’s QB Class

Baylor QB Sawyer Robertson | Gruden’s QB Class

April 20, 2026
Polygon Labs Reveals Rebranding of MATIC Token to POL in September, Accompanied by Significant Technical Enhancements – The Daily Hodl

Polygon Labs Reveals Rebranding of MATIC Token to POL in September, Accompanied by Significant Technical Enhancements – The Daily Hodl

July 20, 2024
rewrite this title Jordan turns to blockchain tech for enhancing government operations

rewrite this title Jordan turns to blockchain tech for enhancing government operations

January 1, 2025
rewrite this title Mark Zuckerberg Wants a Prediction Market Too: NYT – Decrypt

rewrite this title Mark Zuckerberg Wants a Prediction Market Too: NYT – Decrypt

June 23, 2026
rewrite this title Solana is subsidizing high-volume traders before on-chain markets prove the activity can stick

rewrite this title Solana is subsidizing high-volume traders before on-chain markets prove the activity can stick

June 23, 2026
rewrite this title World Cup 2026: Portugal 5-0 Uzbekistan – Cristiano Ronaldo arrives at tournament to break more scoring records

rewrite this title World Cup 2026: Portugal 5-0 Uzbekistan – Cristiano Ronaldo arrives at tournament to break more scoring records

June 23, 2026
The Pat McAfee Show Live | Tuesday June 23rd 2026

The Pat McAfee Show Live | Tuesday June 23rd 2026

June 23, 2026
rewrite this title and make it good for SEOAnthropic releases Claude Tag, a virtual employee that works within slack | Fortune

rewrite this title and make it good for SEOAnthropic releases Claude Tag, a virtual employee that works within slack | Fortune

June 23, 2026
Joe Rogan Experience #2517 – Taylor Sheridan

Joe Rogan Experience #2517 – Taylor Sheridan

June 23, 2026
DeFi Daily

Stay updated with DeFi Daily, your trusted source for the latest news, insights, and analysis in finance and cryptocurrency. Explore breaking news, expert analysis, market data, and educational resources to navigate the world of decentralized finance.

  • About Us
  • Blogs
  • DeFi-IRA | Learn More.
  • Advertise with Us
  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2024 Defi Daily.
Defi Daily is not responsible for the content of external sites.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • Cryptocurrency
    • Bitcoin
    • Ethereum
    • Altcoins
    • DeFi-IRA
  • DeFi
    • NFT
    • Metaverse
    • Web 3
  • Finance
    • Business Finance
    • Personal Finance
  • Markets
    • Crypto Market
    • Stock Market
    • Analysis
  • Other News
    • World & US
    • Politics
    • Entertainment
    • Tech
    • Sports
    • Health
  • Videos

Copyright © 2024 Defi Daily.
Defi Daily is not responsible for the content of external sites.