Teams I work with use the abstain rate to flag what goes to a human. Disagreemen...

onceonceonce 23 days ago | parent | context | favorite | on: Disagreement among frontier LLMs on real-world fac...

Teams I work with use the abstain rate to flag what goes to a human. Disagreement between models is the same idea. Your 67% is what makes "two cheap models, escalate when they fight" actually work. Without abstain it mostly looks like noise.