Hacker Newsnew | past | comments | ask | show | jobs | submit | fromlogin
Codex Daily Benchmarks for Degradation Tracking (Marginlab.ai) (marginlab.ai)
1 point by wendgeabos 7 days ago | past | discuss
Claude Code daily benchmarks for degradation tracking (marginlab.ai)
759 points by qwesr123 7 days ago | past | 354 comments
No one is evaluating AI coding agents in the way they are used (marginlab.ai)
1 point by qwesr123 23 days ago | past
Claude Code Daily Degradation Tracker (marginlab.ai)
3 points by qwesr123 27 days ago | past | 3 comments
Anatomy of a Coding Agent: A step-by-step illustration (marginlab.ai)
3 points by qwesr123 45 days ago | past
How are coding assistants evaluated? SWE-Bench Pro Explorer (marginlab.ai)
2 points by qwesr123 47 days ago | past
SWE-Bench: The $500B Benchmark (marginlab.ai)
5 points by qwesr123 49 days ago | past

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: