If you opened this just to figure out whether Claude got better or whether your workflow got worse, the useful takeaway is simple: AI gets stronger when it stops trusting answer #1. Claude Code looked better when separate drafts had to disagree before one got picked.

The easy mistake is treating Claude Code and Claude like the same tool, then assuming the one with the better score should do everything. I wouldn't read this result that way. Claude Code helps you see the problem clearly; Claude helps finish the rest cleanly.

"I gave Claude Code ADHD.. and it thinks 2x better now" [C001] is the loud headline. The useful part is the method: make separate drafts first, keep them isolated so they do not anchor on each other too early, then compare them at the end. The gain is disagreement.

The part that made me pay attention was the public project-page test: 6 coding tasks, 5 wins, 1 loss versus the normal flow. That does not prove a universal 2x jump. It does show this is more than one lucky demo.

The takes that spread are never just "the model got better." They are "why the strongest version was not shipped raw." Boundary: this evidence only covers a public 6-task coding test on the project page. No hardware or OS details, and nothing about easy lookup work. Share this with anyone still over-trusting answer #1.