Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
SpicyLemonZest
2 days ago
|
parent
|
context
|
favorite
| on:
Lessons for Agentic Coding: What should we do when...
It doesn't sound to me like this benchmark is attempting to measure architecture design. As far as I see in the paper, they do not evaluate the architectural quality of a task completion, only whether the model is capable of completing it at all.
help
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: