你刚刷到这条消息,本来准备顺手划走,但又怕自己错过了真正会影响下一步判断的那一点。

最容易做错的,是Ornith-1.0: Self-Scaffolding LLMs for Agentic Coding;代价往往是如果只盯表面热闹,你很容易在错误方向上花掉时间、预算和注意力。;我先给一个保守判断:会改自己的Agent,才算下一代编码器。。

My conservative take is simple: an agent only counts as next-gen when it can rewrite itself. A release is worth attention not because of how many features it lists, but because it changes your next decision. For coding agents, that means the way it is prompted, the tools it uses, and the retry loop around its work stop being fixed settings and become editable parts of the system.

That is why the feature list is background. The real leap is self-editing. The clearest adjacent evidence is SICA: on a random SWE-bench Verified subset, performance reportedly moved from 17% to 53% when the agent could edit itself instead of only editing the target repo. The gain is not just better code generation. It is the ability to rewrite the worker.

STOP points in the same direction: the scaffold, basically the control layer around the model, can recursively improve itself, and the improved version outperforms the seed version. That is also where I would keep the boundary tight: this read comes from Ornith-1.0 plus adjacent papers arXiv:2504.15228 and arXiv:2310.02304, not from a reproduced head-to-head repo benchmark.

So my filter from here is simple: do not ask only whether a coding agent writes stronger code. Ask whether it can modify its own loop and remove a layer of repetitive work for you. If that framing would save someone on your team from chasing feature lists, share this with them. If you review coding agents today, are you scoring self-editing explicitly or still ranking mostly on code generation quality?

#AgenticAI #CodingAgents #LLMEngineering #DevTools

真正该讨论的是:Ornith-1.0: Self-Scaffolding LLMs for Agentic Coding