Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Seems like a very cool technique, but also very oversold. He's seeing a 5% improvement on a find and replace benchmark of his own devising and saying stuff like this in the blog post:

> Here is why that is backwards. I just showed that a different edit format improves their own models by 5 to 14 points while cutting output tokens by ~20%. That’s not a threat. It’s free R&D.

He makes it sounds like he got a 5-14% boost on a top level benchmark, not 5% improvement on a narrow find and replace metric. Anecdotally, I don't usually have a lot of issues with editing in Claude Code or Cursor, and if there is an issue the model corrects it.

Assuming that it costs double the tokens when it has to correct itself, and find and replace errors are as prominent in actual day to day use as his benchmark, we're talking a 5% efficiency gain in editing token use (not reasoning or tool use). Given that editing must be less than 1/3 of the token use (I assume much less?), we're talking an overall efficiency gain of less than 1%.

This seems like a promising technique but maybe not a high priority in efficiency gains for these tools. The messianic tone, like assuming that Google cut off his access to suppress his genius editing technique rather than just because he was hammering their API also leaves a bad taste, along with the rampant and blatant ChatGPTisms in the blog post.



> “replace line 2:f1, replace range 1:a3 through 3:0e, insert after 3:0e.”

Not sure what they're calculating, but this seems to me like it could be many times more efficient than 20%.


So i just build this - with a few changes to the approach and usable as a simple pi-extention without having to use what-the-pi. It seems to work pretty well so far.

https://github.com/offline-ant/pi-hh-read


Why do we need a hash for every line. Why cant we mark every fifth line (or get smarter and calculate entropy of lines and jump longer for empty boilerplate)? I feel adding a random 3 char header to every line while making the edit tool smarter will make the overall understandability of the content dumber.


It's why I added read({ change_file: bool = false }) and change_file(...) ; so it doesn't get confused by default if its just investigating.

I suspect doing it only ever 5th line would make it less clear for the llm.

I'm just experimenting, I wouldn't suggest you use this by default unless you're looking to experiment.


Yes, this looks like O(1) actions, where before, its likely that harnesses are ingesting and outputting huge portions of the source files for each step, and the local uses of str_replace() are themselves O(N) on the users computer. The excess reads and writes from the LLM are O(N^2).


The benchmarks seem to indicate 25-50% reduction in tokens. I'm not sure how that works in real world usage though.


Sure but if we find another few “easy” 5% improvements in find/replace/edit (which is one of the most important actions for coding) then they really start to add up.

Most harnesses already have rather thorough solutions for this problem but new insights are still worth understanding.


> That’s not a threat. It’s free R&D.

That's not a human. It's AI slop.


Yeah the article is full of it, especially the second half. I wonder if at any point we’ll be able to ban slop / low quality content from the internet, I don’t understand why this keeps getting upvoted.


It wouldn't even occur to me to submit ai slop to HN. Some people have no shame.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: