Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Claude Code is being dumbed down? (symmetrybreak.ing)
965 points by WXLCKNO 17 hours ago | hide | past | favorite | 612 comments




Hey, Boris from the Claude Code team here. I wanted to take a sec to explain the context for this change.

One of the hard things about building a product on an LLM is that the model frequently changes underneath you. Since we introduced Claude Code almost a year ago, Claude has gotten more intelligent, it runs for longer periods of time, and it is able to more agentically use more tools. This is one of the magical things about building on models, and also one of the things that makes it very hard. There's always a feeling that the model is outpacing what any given product is able to offer (ie. product overhang). We try very hard to keep up, and to deliver a UX that lets people experience the model in a way that is raw and low level, and maximally useful at the same time.

In particular, as agent trajectories get longer, the average conversation has more and more tool calls. When we released Claude Code, Sonnet 3.5 was able to run unattended for less than 30 seconds at a time before going off the rails; now, Opus 4.6 1-shots much of my code, often running for minutes, hours, and days at a time.

The amount of output this generates can quickly become overwhelming in a terminal, and is something we hear often from users. Terminals give us relatively few pixels to play with; they have a single font size; colors are not uniformly supported; in some terminal emulators, rendering is extremely slow. We want to make sure every user has a good experience, no matter what terminal they are using. This is important to us, because we want Claude Code to work everywhere, on any terminal, any OS, any environment.

Users give the model a prompt, and don't want to drown in a sea of log output in order to pick out what matters: specific tool calls, file edits, and so on, depending on the use case. From a design POV, this is a balance: we want to show you the most relevant information, while giving you a way to see more details when useful (ie. progressive disclosure). Over time, as the model continues to get more capable -- so trajectories become more correct on average -- and as conversations become even longer, we need to manage the amount of information we present in the default view to keep it from feeling overwhelming.

When we started Claude Code, it was just a few of us using it. Now, a large number of engineers rely on Claude Code to get their work done every day. We can no longer design for ourselves, and we rely heavily on community feedback to co-design the right experience. We cannot build the right things without that feedback. Yoshi rightly called out that often this iteration happens in the open. In this case in particular, we approached it intentionally, and dogfooded it internally for over a month to get the UX just right before releasing it; this resulted in an experience that most users preferred.

But we missed the mark for a subset of our users. To improve it, I went back and forth in the issue to understand what issues people were hitting with the new design, and shipped multiple rounds of changes to arrive at a good UX. We've built in the open in this way before, eg. when we iterated on the spinner UX, the todos tool UX, and for many other areas. We always want to hear from users so that we can make the product better.

The specific remaining issue Yoshi called out is reasonable. PR incoming in the next release to improve subagent output (I should have responded to the issue earlier, that's my miss).

Yoshi and others -- please keep the feedback coming. We want to hear it, and we genuinely want to improve the product in a way that gives great defaults for the majority of users, while being extremely hackable and customizable for everyone else.


I can’t count how many times I benefitted from seeing the files Claude was reading, to understand how I could interrupt and give it a little more context… saving thousands of tokens and sparing the context window. I must be in the minority of users who preferred seeing the actual files. I love claude code, but some of the recent updates seem like they’re making it harder for me to see what’s happening.. I agree with the author that verbose mode isn’t the answer. Seems to me this should be configurable

I think folks might be crossing wires a bit. To make it so you can see full file paths, we repurposed verbose mode to enable the old explicit file output, while hiding more details behind ctrl+o. In effect, we've evolved verbose mode to be multi-state, so that it lets you toggle back to the old behavior while giving you a way to see even more verbose output, while still defaulting everyone else to the condensed view. I hope this solves everyones' needs, while also avoiding overly-specific settings (we wanted to reuse verbose mode for this so it is forwards-compatible going fwd).

To try it: /config > verbose, or --verbose.

Please keep the feedback coming. If there is anything else we can do to adjust verbose mode to do what you want, I'd love to hear.


I'll add a counterpoint that in many situations (especially monorepos for complex businesses), it's easy for any LLM to go down rabbit holes. Files containing the word "payment" or "onboarding" might be for entirely different DDD domains than the one relevant to the problem. As a CTO touching all sorts of surfaces, I see this problem at least once a day, entirely driven by trying to move too fast with my prompts.

And so the very first thing that the LLM does when planning, namely choosing which files to read, are a key point for manual intervention to ensure that the correct domain or business concept is being analyzed.

Speaking personally: Once I know that Claude is looking in the right place, I'm on to the next task - often an entirely different Claude session. But those critical first few seconds, to verify that it's looking in the right place, are entirely different from any other kind of verbosity.

I don't want verbose mode. I want Claude to tell me what it's reading in the first 3 seconds, so I can switch gears without fear it's going to the wrong part of the codebase. By saying that my use case requires verbose mode, you're saying that I need to see massive levels of babysitting-level output (even if less massive than before) to be able to do this.

(To lean into the babysitting analogy, I want Claude to be the babysitter, but I want to make sure the babysitter knows where I left the note before I head out the door.)


> I don't want verbose mode. I want Claude to tell me what it's reading in the first 3 seconds, so I can switch gears without fear it's going to the wrong part of the codebase. By saying that my use case requires verbose mode, you're saying that I need to see massive levels of babysitting-level output (even if less massive than before) to be able to do this.

To be clear: we re-purposed verbose mode to do exactly what you are asking for. We kept the name "verbose mode", but the behavior is what you want, without the other verbose output.


Feels like you aren’t really listening to the feedback. Is verbose mode the same as the explicit callouts of files read in the previous versions? Yes, you intended it to fulfill the same need, but, take a step back. Is it the same? I’m hearing a resounding “no”. At the very least if you hace made such a big change, you’ve gotten rid of the value of a true “verbose mode”.


This is an interesting and complex ui decision to make.

Might it have been better to retire and/or rename the feature, if the underlying action was very different?

I work on silly basic stuff compared to Claude Code, but I find that I confuse fewer users if I rename a button instead of just changing the underlying effect.

This causes me to have to create new docs, and hopefully triggers affected users to find those docs, when they ask themselves “what happened to that button?”


Yeah, in hindsight, we probably should have renamed it.

It's not too late.

> To be clear: we re-purposed verbose mode to do exactly what you are asking for. We kept the name "verbose mode", but the behavior is what you want, without the other verbose output.

Verbose mode feels far too verbose to handle that. It’s also very hard to “keep your place” when toggling into verbose mode to see a specific output.


I think the point bcherny is making in the last few threads is that, the new verbose mode _default_ is not as verbose as it used to be and so it is not "too verbose to handle that". If you want "too verbose", that is still available behind a toggle

Yeah, I didn't realize that there's a new sort of verbose mode now which is different than the verbose mode that was included previously. Although I'm still not clear on the difference between "verbose mode" and "ctrl + o". Based on https://news.ycombinator.com/item?id=46982177 I think they are different (specifically where they say "while hiding more details behind ctrl+o".

We don’t want verbose mode. We don’t want the whole file contents. We are not asking for that. What is not clear here?

All we want is the file paths. That is all. Verbose mode pulls in a lot of other information that might very well be needed in other contexts. People who want that info should use verbose mode. All we want is the regular non-verbose mode, with paths.

I fail to see how it is confusing to users, even new users, to print which paths were accessed. I fail to see the point of printing that some paths were accessed, but not which.


Verbose mode does exactly what you want as of v2.1.39, you are confusing it with the full transcript which is a different feature (ctrl+o). You enable verbose mode in /config and it gives you files read and search patterns and token count, not whole file contents.

I thought I was the only person going crazy by the new default behavior not showing the file names! Please don't expect users to understand your product details and config options in such detail, it was working well before, let it remain. Or at least show some message like "to view file names, do xyz" in the ui for a few days after such a change.

While we're here, another thing that's annoying: the token counter. While claude is working, it read some files, makes an edit, let's say token counter is at 2k tokens, I accept the edit, now it starts counting very fast from 0 to 2k and then shows normal inference speed changes to 2.1k, 2.3k etc. So wanted to confirm: is that just some UI decision and not actually using 2k tokens again? If so, it would be nice to have it off, just continue counting where you left off.

Another thing: is it possible to turn off the words like finagling and similar (I can't remember the spelling of any of them) ?


> Another thing: is it possible to turn off the words like finagling and similar (I can't remember the spelling of any of them) ?

Big +1 on that. I find the names needlessly distracting. I want to just always say a single thing like “thinking”


You should be able to do something like this:

    "spinnerVerbs": {
      "mode": "replace",
      "verbs": ["Thinking"]
    }
https://code.claude.com/docs/en/settings#available-settings

Thank you for the config and the link, that's very much appreciated!

How absurd this is an option, but I’ll be using this config too.

I remember they shipped a feature so that’s configurable.

FWIW I mentioned this in the thread (I am the guy in the big GH issue who actually used verbose mode and gave specific likes/dislikes), but I find it frustrating that ctrl+o still seems to truncate at strange boundaries. I am looking at an open CC session right now with verbose mode enabled - works pretty well and I'm glad you're fixing the subagent thing. But when I hit ctrl+o, I only see more detailed output for the last 4 messages, with the rest hidden behind ctrl+e.

It's not an easy UI problem to solve in all cases since behavior in CC can be so flexible, compaction, forking, etc. But it would be great if it was simply consistent (ctrl+o shows last N where N is like, 50, or 100), with ctrl+e revealing the rest.


Yes totally. ctrl+o used to show all messages, but this is one of the tricky things about building in a terminal: because many terminals are quite slow, it is hard to render a large amount of output at once without causing tearing/stutter.

That said, we recently rewrote our renderer to make it much more efficient, so we can bump up the default a bit. Let me see what it feels like to show the last 10-20 messages -- fix incoming.


thanks dude. you are living my worst nightmare which is that my ultra cool tech demo i made for cracked engineers on the bleeding edge with 128GB ram apple silicon using frontier AI gets adopted by everyone in the world and becomes load bearing so now it needs to run on chromebooks from 2005. and if it doesn't work on those laptops then my entire company gets branded as washed and not goated and my cozy twitter account is spammed with "why didn't you just write it in rust lel".

o7


Your worst nightmare. For me this is the cool part.

Just tell people to install a fast terminal if they somehow happen to have a slow one?

Heck, simply handle the scrolling yourself a la tmux/screen and only update the output at most every 4ms?

It's so trivial, can't you ask your fancy LLM to do it for you? Or you guys lost the plot at his point and forgot the most basics of writing non pessimized code.


I'm with you on this one. "Terminals are too slow to support lots of text so we had to change this feature in unpopular ways" is just not a plausible reason, as terminals have been able to dump ~1Mb per second for decades.

The real problem is their ridiculous "React rendering in the terminal" UI.


Do you have any examples of slow terminals, and what kind of maximum characters per second they have?

Why would you tailor your product for people that don’t know how to install a good terminal? Just tell them to install whatever terminal you recommend if they see tearing.

Honestly, I just want to be able to control precisely what I see via config.json. It will probably differ depending on the project. This is a developer tool, I don't see why you'd shy away from providing granular configuration (alongside reasonable defaults).

I actually miss being able to see all of the thinking, for example, because I could tell more quickly when the model was making a wrong assumption and intervene.


Exactly. If a user wants a simpler experience there is now the Claude Cowork option.

That's a cool idea!

Honestly Tmux, vim, kitty, almost every terminal, shell, script is configurable. It’s what we’re used to. I wouldn’t know why you wouldn’t start allowing more config options.

I do not use CC (yet) but I think this is the right direction. We are hackers. We love hacking. We love to tinker about and configure! Please allow us.

(And yeah, I would love the verbose mode myself, but there could be various levels to it.)


Maybe during onboarding you could ask for output preference? That would at least help new users.

I find this decision weird due to claude _code_, while being used by _some_ non-technical users, is mostly used by technical users and developers.

Not sure why the choice would be to dumb the output down for technical users/developers.


One use I have for seeing what exactly it is doing is to press Esc quick when I see it's confused and starts searching for some info that eg got compacted away, often going on a big quest like searching an entire large directory tree etc. What would actually wish is if it would ask me in these cases. It clearly know that it lacks info but thinks it can figure it out by itself by going on a quest and that's true but takes too long. It could just ask me. There could be some mode settings of how much I want to be involved and consulted, like just ask boldly for any factual info from me, or if I just want to step away and it should just figure everything out on its own.

I've commented on this ticket before: https://github.com/anthropics/claude-code/issues/8477#issuec...

The thinking mode is super-useful to me as I _often_ saw the model "think" differently from the response. Stuff like "I can see that I need to look for x, y, z to full understand the problem" and then proceeds to just not do that.

This is helpful as I can interrupt the process and guide it to actually do this. With the thinking-output hidden, I have lost this avenue for intervention.

I also want to see what files it reads, but not necessarily the output - I know most of the files that'll be relevant, I just want to see it's not totally off base.

Tl;dr: I would _love_ to have verbose mode be split into two modes: Just thinking and Thinking+Full agent/file output.

---

I'm happy to work in verbose mode. I get many people are probably fine with the standard minimal mode. But at least in my code base, on my projects, I still need to perform a decent amount of handholding through guidance, the model is not working for me the way you describe it working for you.

All I need is a few tools to help me intervene earlier to make claude-code work _much_ better for me. Right now I feel I'm fighting the system frequently.


Yep, this is what we landed now, more or less: verbose mode is just file paths, then ctrl+o gives you thinking, agent output, and hook output.

Have you considered picking a new name for a different concept?

Or have ctrl+o cycle between "Info, Verbose, Trace"?

Or give us full control over what gets logged through config?

Ideally we would get a new tab where we could pick logging levels on:

  - Thoughts
  - Files read / written
  - Bashes
  - Subagents
etc.

How do you respond to the comment that; given the log trace:

“Did something 2 times”

That may as well not be shown at all in default mode?

What useful information is imparted by “Read 4 files”?

You have two issues here:

1) making verbose mode better. Sure.

2) logging useless information in default.

If you're not imparting any useful information, claude may as well just show a spinner.


It's a balance -- we don't want to hide everything away, so you have an understanding of what the model is doing. I agree that with future models, as intelligence and trust increase, we may be able to hide more, but I don't think we're there yet.

That's perfectly reasonable, but I genuinely don't understand how "read 2 files" is ever useful at all. What am I supposed to do with this information? How can it help me redirect the model?

Like, I'm open to the idea that I'm the one using your software the wrong way, since obviously you know more about it than I do. What would you recommend I do with the knowledge of how many files Claude has read? Is there a situation where this number can tell me whether the model is on the right track?


Not only what files, but what part of the files. Seeing 1-6 lines of a file that's being read is extremely frustrating, the UX of Claude code is average at best. Cursor on the other hand is slow and memory intensive, but at least I can really get a sense of what's going on and how I can work with it better.

I am not a claude user, but a similar problem I see on opencode is accessing links. More than once I've seen Kimi, GLM or GPT go tothe wrong place and waste tokens until I interrupt them and tell them a correct place to start looking for documentation or whatever they were doing.

If I got messages like "Accessed 6 websites" I'd flip and go spam a couple github issues with as much "I want names" as I could.


Such as Claude Code reading your ssh keys. Hiding the file names masks the vulnerability.

That's approaching the problem from the worst possible angle. If your security depends on you catching 1 message in a sea of output and quickly rotating the credential everywhere before someone has a chance to abuse it then you were never secure to begin with.

Not just because it requires constant attention which will eventually lapse, but because the agent has an unlimited number of ways to exfiltrate the key, for example it can pretend to write and run a "test" which reads your key, sends it to the attacker and you'll have no idea it's happening.


What annoys me is that I don’t have the choice anymore. It’s just decided that thinking is not possible to see anymore, files being read are very difficult to see, etc.

I understand that I’m probably not the target audience if I want to actually step in and correct course, but it’s annoying.


> saving thousands of tokens and sparing the context window

shhh don't say that, they will never fix it if means you use less tokens.


I'm a screen reader user and CTO of an accessibility company. This change doesn't reduce noise for me. It removes functionality.

Sighted users lost convenience. I lost the ability to trust the tool. There is no "glancing" at terminal output with a screen reader. There is no "progressive disclosure." The text is either spoken to me or it doesn't exist.

When you collapse file paths into "Read 3 files," I have no way to know what the agent is doing with my codebase without switching to verbose mode, which then dumps subagent transcripts, thinking traces, and full file contents into my audio stream. A sighted user can visually skip past that. I listen to every line sequentially.

You've created a situation where my options are "no information" or "all information." The middle ground that existed before, inline file paths and search patterns, was the accessible one.

This is not a power user preference. This is a basic accessibility regression. The fix is what everyone in this thread has been asking for: a BASIC BLOODY config flag to show file paths and search patterns inline. Not verbose mode surgery. A boolean.

Please just add the option.

And yes, I rewrote this with Claude to tone my anger and frustration down about 15 clicks from how I actually feel.


Try Codex instead. Much greener pastures overall

I do love my subagents and I wrote an entire Claude Code audio hook system for a11y but this would be still rather compelling if Codex weren't also somewhat of an a11y nightmare. It does some weird thing with ... maybe terminal repaints or something else that ends up rereading the same text over and over. Claude Code does this similarly but Codex ends up reading like ... all the weird symbols and other stuff? window decorations? and not just the text like CC does. They are both hellish but CC slightly? less so... until now.

Sorry for being off-topic, but isn't a11y a rather ironic term for accessibility? It uses a very uncommon abbreviation type -- numeronym, and doesn't mean anything to the reader unless they look it up (or already know what it means).

Is it as bad with the Codex app, or VS Code plugin?

They are much more responsive on GitHub issues than Anthropic so you could also try reporting your issue there


Hey -- we take accessibility seriously, and want Claude Code to work well for you. This is why we have repurposed verbose mode to do what you want, without the other verbose output. Please give it a try and let me know what you think.

It's well meaning but I think this goes against something like the curb effect. Not a perfect analogy but, verbosity is something you have to opt into here: Everyone benefits from being able to glance at what the agent is up to by default. Nobody greatly benefits from the agent being quiet by default.

If people find it too noisy, they can use the flag or toggle that makes everything quieter.

p.s. Serendipitously I just finished my on-site at anthropic today, hi :)


> we take accessibility seriously

Do you guys have a screen reader user on the dev team?

Is verbose mode the same as the old mode, where only file paths are spoken? Or does it have other text in it? Because I tried to articulate, and may have failed. More text is usually bad for me. It must be consumed linearly. I need specific text.

Quality over quantity


"Is verbose mode the same as the old mode, where only file paths are spoken?" -- yes, this is exactly what the new verbose mode is.

Casually avoiding the first question

And how to get to the old verbose mode then...?

Hit ctrl+o

Wait so when the UI for Claude Code says “ctrl + o for verbose output” that isn’t verbose mode?

That is more verbose — under the hood, it’s now an enum (think: debug, warn, error logging)

Considering the ragefusion you're getting over the naming, maybe calling it something like --talkative would be less controversial? ;-)

Hi Boris, by far the most upvoted issue at 2550 on your github is "Support AGENTS.md" with 2550 upvotes. The second highest one has 563. Every single other agent supports AGENTS.md. Care to share why you haven't?

> Yoshi and others -- please keep the feedback coming. We want to hear it, and we genuinely want to improve the product in a way that gives great defaults for the majority of users, while being extremely hackable and customizable for everyone else.

I think an issue with 2550 upvotes, more than 4 times of the second-highest, is very clear feedback about your defaults and/or making it customizable.


I'm sorry, this comment is opportunistic and a bit annoying to post here. Saying "keep the feedback coming" is not an invitation to turn this thread into the issue queue

Let's be real here, regardless of what Boris thinks, this decision is not in his hands.

Would love to hear what Boris thinks.

> Every single other agent supports AGENTS.md. Care to share why you haven't?

Are you actually wondering, or just hoping to hear a confirmation of what you already know? Because the reason behind it is pretty clear, it doubles as both vendor lock-in and advertisement.


I'd love to hear Boris' thoughts on it given his open invitation for feedback and _genuinely_ wanting to improve the product, including specifically hackability and customizability (emphasis mine).

I don't understand this take Boris:

> The amount of output this generates can quickly become overwhelming in a terminal

If I use Opus 4.6, arguably the most verbose, over thinking model you've released to date, OpenCode handles it just the same as it does Sonnet 4.0.

OpenCode even allows me to toggle into subagent and task agents with their own output terminals that, if I am curious what is going on, I can very clearly see it.

All Claude-Code has done has turned the output into a black box so that I am forced to wait for it to finish to look at the final git diff. By then it's spent $5-10 working on a task, and threw away a lot of the context it took to get there. It showed "thinking" blocks that weren't particularly actionable, because it was mostly talking to itself that it can't do something because it goes against a rule, but it really wants to.

I'm actually frustrated with Code blazing through to the end without me able to see the transcript of the changes.


There are so many config options. Most I still need to truly deeply understand.

But this one isn't? I'd call myself a professional. I use with tons of files across a wide range of projects and types of work.

To me file paths were an important aspect of understanding context of the work and of the context CC was gaining.

Now? It feels like running on a foggy street, never sure when the corner will come and I'll hit a fence or house.

Why not introduce a toggle? I'd happily add that to my alisases.

Edit: I forgot. I don't need better subagent output. Or even less output whrn watching thinking traces. I am happy to have full verbosity. There are cases where it's an important aspect.


You want verbose mode for this -- we evolved it to do exactly what you're asking for: verbose file reads, without seeing thinking traces, hook output, or (after tomorrow's release) full subagent output.

More details here: https://news.ycombinator.com/item?id=46982177


There's no way you're still talking about verbose mode.. this is insane.

My guy, you are on repeat mode. No one cares about your control-o verbose mode. Did you even pass any coding assessments to get your job?

Sorry if this is just for giggles and doesn't add anything of value to the discussion, but I couldn't resist and asked Claude Sonnet 4.5 and Opus 4.6 to analyze the github issue that was opened.

Funnily enough, both independently sided with the users, not the authors.

The core problem: --verbose was repurposed instead of adding a new toggle. Users who relied on verbose for debugging (thinking, hooks, subagent output) now have broken workflows - to fix a UX decision that shouldn't have shipped as default in the first place.

What should have been done:

  /config
  Show file paths: [on/off]
  Verbose mode: [on/off]  (unchanged)
A simple separate toggle would've solved everything without breaking anyone's workflow.

Opus 4.6's parting thought: if you're building a developer tool powered by an AI that can reason about software design, maybe run your UX changes past it before shipping.

To be fair, your response explains the design philosophy well - longer trajectories, progressive disclosure, terminal constraints. All valid. But it still doesn't address the core point: why repurpose --verbose instead of adding a separate toggle? You can agree with the goal and still say the execution broke existing workflows.


I don't see how you can blame terminal applications - they typically have been able to dump around 1Mb of output per second for decades.

https://martin.ankerl.com/2007/09/01/comprehensive-linux-ter...

Could the React rendering stack be optimised instead?


I believe he is speaking of the effective resolution of TUIs, not pty throughput rates or fps, though I do agree with what you’re actually getting it.

From the list of problems they are experiencing with rendering in the terminal, it sounds like they want a GUI (Electron would be a good fit).

> The amount of output this generates can quickly become overwhelming in a terminal, and is something we hear often from users. Terminals give us relatively few pixels to play with; they have a single font size; colors are not uniformly supported; in some terminal emulators, rendering is extremely slow. We want to make sure every user has a good experience, no matter what terminal they are using. This is important to us, because we want Claude Code to work everywhere, on any terminal, any OS, any environment.

If you are serious about this, I think there are so many ways you could clean up, simplify, and calm the Claude Code terminal experience already.

I am not a CC user, but an enthusiastic CC user generously spent an hour or two last week or so showing me how it worked and walking through an non-publicly-implemented Gwern.net frontend feature (some CSS/JS styling of poetry for mobile devices).

It was highly educational and interesting, and Claude got most of the way to something usable.

Yet I was shocked and appalled by the CC UI/UX itself: it felt like the fetal alcohol syndrome lovechild of a Las Vegas slot machine and Tiktok. I did not realize that all those jokes about how using CC was like 'crack' or 'ADHD' or 'gambling' were so on point, I thought they were more, well, metaphorical about the process as a whole. I have not used such a gross and distracting UI in... a long time. Everything was dancing and bouncing around and distracting me while telling me nothing. I wasted time staring at the update monitor trying to understand if "Prognosticating..." was different from "Fleeblegurbigating..." from "Reticulating splines...", while the asterisk bounces up and down, or the colored text fades in and out, all simultaneously, and most of the screen was wasted, and the whole thing took pains to put in as much fancy TUI nonsense as it could. An absolute waste, not whimsy, of pixels. (And I was a little concerned how much time we spent zoned out waiting on the whole shabang. I could feel the productivity leaving my body, minute by minute. How could I possibly focus on anything else while my little friendly bouncing asterisk might finish at any instant...?!) Some description of what files are being accessed seems like you could spare the pixels for them.

So I was impressed enough with the functionality to move it up my list, but also much of it made me think I should look into GPT Codex instead. It sounds like the interfaces there respect my time and attention more, rather than treating me like a Zoomer.


Thanks for the long and considered response, but this is a really ugly UX decision.

As others have said - 'reading 10 files' is useless information - we want to be able to see at a glance where it is and what it's doing, so that we can re-direct if necessary.

With the release of Cowork, couldn't Claude Code double down on needs of engineers?


> this resulted in an experience that most users preferred

I just find that very hard to believe. Does anyone actually do anything with the output now? Or are they just crossing their fingers and hoping for the best?


Have you tried verbose mode? /config > verbose. It should do exactly what you are looking for now, without extraneous thinking/subagent/hook output. We hear the feedback!

The default view hiding files read is fully a regression imho. It is so helpful for sense of control, nevermind trust and human agency.

Please revert this


One thing this specific feature was letting me do is seeing when Claude Code takes a wrong turn, read a wrong memory MD file. I used to immediately interrupt and correct its course. Now it is more opaque and there is less of a hint at CC's reasoning.

Also open source CC already.

And stop banning 3rd party harnesses please. Thanks

Anthropic, your actual moat is goodwill. Remember that.


At some point we need to start preferring GUIs instead of terminals as the AI starts giving us more and more information. Features like hover-over tooltips and toggle switches designed for mouse operation might really start to matter.

Maybe "AI IDEs" will gain ground in the future, e.g. vibe-kanban


Yes I don't understand why Claude code needs to be a terminal app.

It doesn't compose with any other command line program and the terminal interface is limiting.

I'm surprised nobody has yet made a coding assistant that runs in the browser or as a standalone app. At this point it doesn't really need to integrate with my text editor or IDE.


Please for the love of God no. I rather have something completely agnostic of an IDE. OpenCode is doing the right thing IMO

You can have something IDE agnostic but still not be dependent on the ancient VT100 terminal protocol and rendering path.

(That said I do like being able to SSH in and run an agent that way. But there are other remote access modalities.)


clause is perfect every time, no quibbles, the IT industry is simply has to adapt to the new shift. surely people who earn a living by writing code will find fault with it, but even with Claude, code will not write itself, its a simple shift from writing code to make code work better/integrate/tweak/refine/personalize/customize it, thank you Boris and team we are over the moon over

Hi Boris, did Claude Code itself author this change? I am curious as you said that all of your recent PRs were authored by Claude Code. If that's the case, just wondering what objective did you ask it to optimize for? Was it something like: make the UI simpler?

Do you feel that a terminal UX will remain your long term interface for Claude Code? Or would you consider a native interface like Codex has built?

Hello Boris. First of all, I apologize for replying unrelated to your post or comment. The reason I'm leaving a comment is because there's a critical issue currently going on regarding new accounts, with over 100 people commenting. This issue has been open for over three weeks. I'd appreciate it if you could look into it.

https://github.com/anthropics/claude-code/issues/19673


Thanks Boris, great insights for builders.

Just give multiple options in the config file. Give us the current default, what you now call verbose mode and the previous verbose mode. If Claude is as effective as marketing claims then maintaining all 3 options should be trivially doable, we've been doing more complex configuration in tons of apps for decades.

Why does everything have to be in the TUI? I like the TUI. But I also want all the logs. And I do mean all of them.

Of course all the logs can’t be streamed to a terminal. Why would they need to be? Every logging system out there allows multiple stream handlers with different configurations.

Do whatever reasonable defaults you think make sense for the TUI (with some basic configuration). But then I should also be able to give Claude-code a file descriptor and a different set of config optios, and you can stream all the logs there. Then I can vibe-code whatever view filter I want on top of that, or heck, have a SLM sub-agent filter it all for me.

I could do this myself with some proxy / packet capture nonsense, but then you’d just move fast and break my things again.

I’m also constantly frustrated by the fancier models making wrong assumptions in brownfield projects and creating a big mess instead of asking me follow-up questions. Opus is like the world’s shittiest intern… I think a lot of that is upstream of you, but certainly not all of it. There could be a config option to vary the system prompt to encourage more elicitation.

I love the product you’ve built, so all due respect there, but I also know the stench of enshittification when I smell it. You’re programmers, you know how logging is supposed to work. You know MCP has provided a lot of these basic primitives and they’re deliberately absent from claude code. We’ve all seen a product get ratfucked internally by a product manager who copied the playbook of how Prabhakar Raghavan ruined google search.

The open source community is behind at the moment, but they’ll catch up fast. Open always beats closed in the long run. Just look at OpenAI’s fall into disgrace.


So in a nutshell Claude becoming smarter means that logic that once resided in the agent is being moved to the model?

If that's the case, it's important to asses wether it'll be consistent when operating on a higher level, less dependent on the software layer that governs the agent. Otherwise it'll risk Claude also becoming more erratic.


I'm going to be paranoid and guess they're trying to segment the users into those that'll notice they're dumbing down the system via caches, limited model via quantized downgrade and those that expect the fully available tools.

Thariq (who's on the Claude Code team) swears up and down that they do not do this.

Honestly, man, this is just weird new tech. We're asking a probabilistic model to generate English and JSON and Bash at the same time in an inherently mutable environment and then Anthropic also release one or two updates most workdays that contain tweaks to the system prompt and new feature flags that are being flipped every which way. I don't think you have to believe in a conspiracy theory to understand why it's a little wobbly sometimes.


Yeah, I know it's new tech and the pipeline for the magic is a bunchof shims ontop of a non-deterministic models; but the MBAs are going to swoop in eventually and segmenting the users into tiers of price discrimnation is in the pike regardless of how earnest the current PMs are.

Hmm, honestly I'm not so sure. Many devs seem extremely price-sensitive and the switching cost is... zero.

If Anthropic do something you don't like, you just set a few environment variables and suddenly you're using the Claude Code harness with a local model, or one of thousands available through OpenRouter. And then there is also OpenCode. I haven't tried this, but I'm not worried.

^ https://github.com/ruvnet/claude-flow/wiki/Using-Claude-Code...


Unless your employer made a deal and suddenly you are forced to use one provider for the foreseeable future.

There must have been a more concise way to write this damage control.

I am not a programmer and detest the terminal environment, while I design complexity, i need simple interfaces, claude is now guiding all dev based on my initial design spec, makes beatuful notebooks that can be uploaded directly to colab or github, no UX at all, no usability issues, this is the lastet baby we made yesterday starborn.github.io/copp-notebook thank you Claude engineering team for something that is flying very high and takes me with it, starborn.github.io/copp-notebook

> I am not a programmer and detest the terminal environment

As someone who finds formal language a natural and better interface for controlling a computer, can you explain how and why you actually hate it? I mean not stuff like lack of discoverability, because you use a shell that lacks completion and documentation, that have been common for decades, I get those downsides, but why do you detest it in principle?


This conflict shows a pattern across AI products today.

Most tools are still designed with programmers as the default user. Everyone else is treated as an edge case.

But the real growth is outside that bubble. AI won’t become mainstream by hiding everything. And it won’t get there by exposing everything either.

It gets there by translating action into intent. By showing progress in human terms. By making people feel they’re still in control.

The teams that figure this out won’t just win an argument on GitHub. They’ll reach the much larger audience that’s still waiting on the sidelines.

My detail here: https://open.substack.com/pub/insanedesigner/p/building-ai-f...


This kind of attitude, above all else, is why anthropic is winning imo. Thanks.

Ignoring user input?

You've reached the stage where if something is possible in CC, someone out there is using it. Taking anything away will have them ask for it back ; you need to let people toggle things. https://xkcd.com/1172/

I’m just some tinkerer and signed up just to say this. These are my thoughts after reading the blog post and ur response in full.

I subscribe to max rn. Tons of money. Anthropic’s Super Bowl ads were shit, not letting us use open code was shit, and this is more shit. Might only be a single straw left before I go to codex (no one’s complaining about it. And the openclaw creator prefers it)

This dev is clearly writing his reply with Claude and sounding way too corpo. This feels like how school teachers would talk to you. Your response in its length was genuinely insulting. Everyone knows how to generate text with AI now and you’re doing a terrible job at it. You can even see the emdash attempt (markdown renders two normal dashes as an emdash).

This was his prompt “read this blog post, familiarize yourself with the mentioned GitHub issue and make a response on behalf of Anthropic.” He then added a little bit at the end when he realized the response didn’t answer the question and got so to fix the grammar and spelling on that.

Your response is appropriate for the masses. But we’re not. We’re the so called hackers and read right through the bs. It’s not even about the feature being gone anymore.

There is a principle we uphold as “hackers” that doesn’t align with this that pisses people off a lot more than you think. I can’t really put my finger on it maybe someone can help me out.

PS About the Super Bowl ads. Anyone that knows the story knows they’re exaggerated. (In the general public outside of Silicon Valley it’s like a 50/50 split or something about people liking or disliking AI as a whole rn. OpenAI is doing way more to help the case (not saying ads are a good thing). ) Open ai used to feel like the bad guy now it’s kinda shifting to anthropic. This, the ads and open code are all examples of it. (I especially recommend people watch the anthropic and open ai Super Bowl ads back to back)


> This dev is clearly writing his reply with Claude

> You can even see the emdash attempt (markdown renders two normal dashes as an emdash)

He says he wrote it all manually.[0] Obviously I can't know if that's true, but I do think your internal AI detector is at least overconfident. For example, some of us have been regularly using the double hyphen since long before the LLM era. (In Word, it auto-corrects to an en dash, or to an em dash if it's not surrounded by spaces. In plain text, it's the best looking easily-typable alternative to a dash. AFAICT, it's not actually used for dashes in CommonMark Markdown.)

The rest is more subjective, but there are some things Claude would be unlikely to write (like the parenthetical "(ie. progressive disclosure)" -- it would write "i.e." with both dots, and it would probably follow it with a comma). Of course those could all be intentional obfuscations or minimal human edits, but IMO you are conflating the corporate communications vibe with the LLM vibe.

[0] https://news.ycombinator.com/item?id=46982418


> `For example, some of us have been regularly using the double hyphen since long before the LLM era.

This "emdash" and "double dash" discussion and mention is the first time I have heard of it or seen discussion of it. I've never encountered it in the wild, nor seen it used in any meaningful way in all my time on the internet these last 27 years.

And yes - I've seen that special dash character in word for many years. Not once has anyone said "oh hey I type double dashes and word uses that". No it's always been "word has this weird dash and if you copy-paste it it's weird", and no one knows how it pops up in word, etc.

And yes, I've seen the AI spit out the special dash many times. It's a telltale sign of using LLM generated text.

And now, magically, in this single thread, you can see half-dozen different users all using this "--" as if it's normal. It's like upside down world. Either everyone is now using this brand new form of speaking, or they're covering for this Claude code developer.

So yeah, maybe I've been sticking my head in the sand for years now, or maybe I just blindly ignored double-dashes when reading text till now. But it sure seems fishy.


Sounds like you see me as an untrustworthy source, so all I can suggest is that you look into this yourself. Search for "--" in pre-LLM forum postings and see how many hits you get.

Here are my pre-2020 HN comments, with 3 double hyphens in 8 comments: https://hn.algolia.com/?dateEnd=1576108800&dateRange=custom&...

As I was in the process of typing the search term to get my comments (and had just typed 'author'), this happened to come up as the top search result for Comments by Date for Feb 1st 2000 > Dec 12th 2019: https://news.ycombinator.com/item?id=21768030

Note that I wasn't searching directly for the double hyphen, which doesn't seem to work -- the first result just happened to contain one. If I'm covering for the Anthropic guy, I could be lying about the process by which I found that comment, but I think you should at least see this as sufficient reason to question your assumptions and do some searches of your own.


Boris! Unrelated but thank you and the Anthropic team for Claude code. It’s awesome. I use it every day. No complaints. You all just keep shipping useful little UX things all the time. It must be because it’s being dogfooded internally. Kudos again to the team!

boris-4.6-humble

Hey, It's Damage Control person from Corporate Revenue Maximizing Team here, <5 paragraphs>

> in some terminal emulators, rendering is extremely slow.

Ooo... ooo! I know what this is a reference to!

https://www.youtube.com/watch?v=hxM8QmyZXtg


> Terminals give us relatively few pixels to play with; they have a single font size; colors are not uniformly supported; in some terminal emulators, rendering is extremely slow.

That's why I use your excellent VS Code extension. I have lots of screen space and it's trivial to scroll back there, if needed.

I would really like even more love given to this. When working with long-lived code bases it's important to understand what is happening. Lots of promising UX opportunities here. I see hints of this, but it seems like 80% is TBD.

Ideally you would open source the extension to really use the creativity of your developer user base. ;)


@boris

Can we please move the "Extended Thinking" icon back to the left side of claude desktop, near the research and web search icons? What used to be one click is now three.


> We can no longer design for ourselves, and we rely heavily on community feedback to co-design the right experience. We cannot build the right things without that feedback.

How can that be true, when you're deliberately and repeatedly telling devs (the community you claim to listen to) that you know better than they do? They're telling you exactly what they want, and you're telling them, "Nah." That isn't listening. You understand that, right?


I’m witnessing him respond in real time with not just feedback but also actual changes, in a respectful and constructive manner - which is not easy to do, when there are people who communicate in this rude of a manner. If that’s not listening, then I don’t know what is.

And it shouldn’t need to be said, but the words that appear on the screen are from an actual person with, you know, feelings.


Acting like they can't take the heat when they purposely put themselves in the public sphere is odd.

interesting. they have been pretty receptive to my pull comments and discourse on issues. To each's anecdote I suppose.


so its the users who are dumb :-)

To be honest I think there should be an option to completely hide all code that Claude generates and uses. Summaries, strategies, plans, logs, decisions and questions is all I need. I am convinced that in a few years nobody cares about the programming language itself.

In what terminals is rendering slow? I really think GPU acceleration for terminals (as seen in Ghostty) is silly. It's a terminal.

Edit: I can't post anymore today apparently because of dang. If you post a comment about a bad terminal at least tell us about the rendering issues.


VSCode (xterm.js) is one of the worst, but there's a large long tail of slow terminals out there.

Not really using VS Code terminal anymore, just Ubuntu terminal but the biggest problem I have is that at some point Claude just eats up all memory and session crashes. I know it's not really Claude's fault but damn it's annoying.

its not a bad idea to use one of the GPU Terminals on linux just for claude code, it works out a bit better

As someone who's business is run through a terminal, not everyone uses ghostty, even though they should. Remember, that they don't have a windows version.

this was written with claude lmao what a disgrace not to put a disclaimer.

use your own words!

i would rather read the prompt.


Same. It feels like an insult to read someone’s ai generated stuff. They put no effort into writing it but we now have to put extra effort to reading it because it’s longer than normal.

This is an insanely good response. History, backstory, we screwed up, what we're doing to fix it. Keep up the great work!

would've been better to post the prompt directly IMO

Prompts can be the new data compression. Just send your friend a prompt and the heartfelt penpal message gets decompressed at their end.

it reads like AI generated or at least AI assisted... those -- don't fool me!

fwiw, I wrote it 100% by hand. Maybe I talk to Claude too much..

Nah it doesn't look AI generated to me.

i thought about it being ai generated, but i don't care. it was easy to read and contained the right information. good enough for me. plus, who knows... maybe you were english as a second lang and used ai to clean up your writing. i'd prefer that.

ok claude

This is an extremely disappointing response. The issue is your dev relations people being shitty and unhelpful and trying to solve actual problems with media-relations speak as if engineers are just going to go away in a few days.

Arrogant and clueless, not exactly who I want to give my money to when I know what enshitification is.

They have horrible instincts and are completely clueless. You need to move them away from a public-facing role. It honestly looks so bad, it looks so bad that it suggests nepotism and internal dysfunction to have such a poor response.

This is not the kind of mistake someone makes innocently, it's a window into a worldview that's made me switch to gemini and reactivate cursor as a backup because it's only going to get worse from here.

The problem is not the initial change (which you would rapidly realize was a big deal to a huge number of your users) but how high-handed and incompetent the initial response was. Nobody's saying they should be fired, but they've failed in public in a huge way and should step back for a long time.


> That’s it. “Read 3 files.” Which files? Doesn’t matter. “Searched for 1 pattern.” What pattern? Who cares.

Product manager here. Cynically, this is classic product management: simplify and remove useful information under the guise of 'improving the user experience' or perhaps minimalism if you're more overt about your influences.

It's something that as an industry we should be over by now.

It requires deep understanding of customer usage in order not to make this mistake. It is _really easy_ to think you are making improvements by hiding information if you do not understand why that information is perceived as valuable. Many people have been taught that streamlining and removal is positive. It's even easier if you have non-expert users getting attention. All of us here at HN will have seen UIs where this has occurred.


Product management might be the worst meme in the industry. Hire people who have never used the product and don't think like or accurately represent our users, then let them allocate engineering resources and gate what ships. What could go wrong?

It should be a fad gone by at this point, but people never learn. Here's what to do instead: Find your most socially competent engineer, and have them talk to users a couple times a month. Just saved you thousands or millions in salaries, and you have a better chance of making things that your users actually want.


Good PM's are extremely good at understanding users, and use soft-skills to make the rest of the org focus on users more. I've worked with a couple, and they've added an enormous amount of value, sometimes steering teams of dozens of engineers in a more productive direction.

The problem is, it's hard to measure how good a PM is, even harder than for engineers. The instinct is to use product KPI's to do so, but especially at BigTech company, distribution advantages and traction of previous products will be the dominant factor here, and the best way of raising many product KPI's are actually user-hostile. Someone who has been a successful FAANG engineer who goes to a startup might lean towards over-engineering, but at least they should be sharp on the fundamentals. Someone who has been a successful FAANG PM might actually have no idea how to get PMF.

> Here's what to do instead: Find your most socially competent engineer, and have them talk to users a couple times a month

This is actually a great idea, but what will happen is this socially competent engineer will soon have a new full-time job gathering those insights, coalescing them into actionable product changes, persuading the rest of the org to adopt those changes, and making sure the original user insights make it into the product. Voila: you've re-invented product management.

But I actually think it's good to source PM's from people who've been engineers for a few years. PM's used to come from a technical background; Google famously gave entry-level coding tests to PM's well into the '10s. I dunno when it became more fashionable to hire MBA's and consultants into this role, but it may have been a mistake.


> Voila: you've re-invented product management.

This is a names vs. structure thing. For a moment, taboo the term product manager.

What I'm suggesting is a low risk way to see if an engineer has an aptitude for aligning the roadmap with what the users want. If they aren't great at it, they can go back to engineering. We also know for sure that they are technically competent since they are currently working as an engineer, no risk there.

The conventional wisdom (bad meme) is going to the labor market with a search term for people who claim to know what the users want, any user, any problem, doesn't matter. These people are usually incompetent and have never written software. Then hiring 1 and potentially more of the people that respond to the shibboleth.

If you want the first case, then you can't say "product manager" because people will automatically do the second case.


> What I'm suggesting is a low risk way to see if an engineer has an aptitude for aligning the roadmap with what the users want. If they aren't great at it, they can go back to engineering. We also know for sure that they are technically competent since they are currently working as an engineer, no risk there.

It doesn't have to be the most socially competent engineer to gather feedback. Having the engineering team sit with the target users gives so much insight into how the product is being used.

I once worked on an administrative tool at a financial institution. There were lots of pain points, as it started as a dev tool that turned into a monstrosity for the support staff. We asked to have a meeting with some reps who were literally 2 floors below us. Having the reps talk as they worked with the tool in real time over 1 hour was worth more than a year's worth of feedback that trickled in. It's one thing to solicit feedback. It's another to see how idiosyncrasies shape how products get used.


Putting on a PM hat is something I've been doing regularly in my engineering career over the last quarter century. Even as a junior (still in college!) at my first job I was thinking about product, in no small part because there were no PMs in sight. As I grew through multiple startups and eventually bigger brand name tech companies, I realized that understanding how the details work combined with some sense of what users actually want and how they behave is a super power. With AI this skillset has never been more relevant.

I agree your assessment about the value of good PMs. The issue, in my experience, is that only about 20% (at most) are actually good. 60% are fine and can be successful with the right Design and Engingeering partners. And 20% should just be replaced by AI now so we can put the proper guardrails around their opinions and not be misled by their charisma or whatever other human traits enabled them to get hired into a job they are utterly unqualified for.


I have worked with some really really good product managers.

But not lately. Lately it’s been people who have very little relevant domain expertise, zero interest in putting in the time to develop said expertise beyond just cataloguing and regurgitating feedback from the customers they like most on a personal level, and seem to mostly have only been selected for the position because they are really good at office politics.

But I think it’s not entirely their fault. What I’ve also noticed is that, when I was on teams with really elective product managers, we also had a full time project manager. That possibly freed up a lot of the product manager’s time. One person to be good at the tactical so the other can be good at the strategic.

Since project managers have become passé, though, I think the product managers are just stretched too thin. Which sets up bad incentive structures: it’s impossible to actually do the job well anymore, so of course the only ones who survive are the office politicians who are really good at gladhanding the right people and shifting blame when things don’t go well.


There are individuals who have good taste for products in certain domains. Their own preferences are an accurate approximation for those of the users. Those people might add value when they are given control of the product.

That good taste doesn't translate between domains very often. Good taste for developer tools doesn't mean good taste for a video game inventory screen. And that's the crux of the problem. There is a segment of the labor market calling themselves "product manager" who act like good taste is domain independent, and spread lies about their importance to the success of every business. What's worse is that otherwise smart people (founders, executives) fall for it because they think hiring them is what they are supposed to do.

Over time, as more and more people realized that PM is a side door into big companies with lots of money, "Product Manager" became an imposter role like "Scrum Master". Now product orgs are pretty much synonymous with incompetence.


Taste is pretty transferable, I think what you're talking about is intuition. The foundations of intuition are deeply understanding problems and the ability to navigate towards solutions related to those problems. Both of these are relatively domain-dependent. People can have intuition for how to do things but lack the taste to make those solutions feel right.

Taste on the other hand is about creating an overall feeling from a product. It's holistic and about coherence, where intuition is more bottom-up problem solving. Tasteful decisions are those that use restraint, that strike a particular tone, that say 'no' when others might say 'yes'. It's a lot more magical, and a lot rarer.

Both taste and intuition are ultimately about judgment, which is why they're often confused for one another. The difference is they approach problems from the opposite side; taste from above, intuition from below.

I agree with your assessment otherwise, PM can be a real smoke screen especially across domain and company stage.


> There is a segment of the labor market calling themselves "product manager" who act like good taste is domain independent

That’s definitely one of the biggest problems with product management. The delusion that you can be an expert at generic “product”.

We used to have subject matter experts who worked with engineers. That made sense to me.


The proportion of "really good" PMs on product engineering teams has to be less than 0.1%.

The counter to that is "the proportion of 'really good engineers' to product engineering teams has got to be in the single digits," and I would agree with that, as well.

The problem is what is incentivized to be built - most teams are working on "number go up?" revenue or engagement as a proxy to revenue "problems." Not "is this a good product that people actively enjoy using?" problems.

Just your typical late-stage capitalism shit.


> Hire people who have never used the product and don't think like or accurately represent our users

In most of my engineering jobs, the Product Managers were much closer to our users than the engineers.

Good product managers are very valuable. There are a lot of bad ones carrying the product manager title because it was viewed as the easy way to get a job in tech without having to know how to program, but smart companies are getting better at filtering them out.

> Find your most socially competent engineer, and have them talk to users a couple times a month

Every single time I've seen this tried, it turns into a situation where one or two highly vocal customers capture the engineering team's direction and steer the product toward their personal needs. It's the same thing that happens when the sales people start feeding requests from their customers into the roadmap.


I've worked with some really good product managers, as well as some total duds. They all talked to customers on a regular basis. There is so much more to the job than that. Every time I think I understand a product manager's job completely, I see one fail in a way I hadn't thought about.

For example, I had one product manager who made themselves irrelevant because they wouldn't work with sales. The company needed to sell the product to pay us, and sales talked with potential buyers about what might swing their purchase decision and what they would pay extra for. Since the PM only talked to users and ignored sales when doing product design and product roadmaps, the way sales input got integrated into product development is that we frequently got top-down directives from management to prioritize one-off requests from sales over the roadmap. Needless to say, this didn't lead to a cohesive and easy-to-understand product.

Before I saw that PM failing, I hadn't thought about the relationship between product and sales.


This sentiment is going exactly against the trend right now. AI coding is making technically minded product manager's MORE powerful not less. When/if coding just because your ability to accurately describe what you want to build, the people yielding this skill are the ones who understand customer requirements, not the opposite.

> Find your most socially competent engineer,

These usually get promoted to product management anyway, so this isn't a new thought.


> This sentiment is going exactly against the trend right now.

It's not.

Engineers are having more and more minutia and busy work taken off their plate, now done by AI. That allows them to be heads up more often, more of their cognitive capacity is directed towards strategy, design, quality.

Meanwhile, users are building more and more of their own tools in house. Why pay someone when you can vibe code a working solution in a few minutes?

So product managers are getting squeezed out by smarter people below them moving into their cognitive space and being better at solving the problems they were supposed to be solving. And users moving into their space by taking low hanging fruit away from them. No more month long discussions about where to put the chart and what color it should be. The user made their own dashboard and it calls into the API. What API? The one the PM doesn't understand and a single engineer maintains with the help of several LLMs.

If it's simple and easy: the user took it over, if it's complex: it's going to the smartest person in the room. That has never been the PM.


> if it's complex: it's going to the smartest person in the room. That has never been the PM.

Yet the PM always has the last say on what goes in the product, NOT the engineer. Funny how that works...

None of your conclusions are consistent with experience (interviewed 900+ SaaS management teams)


In my average experience, without interviewing management teams - my observation is that the "smartest person in the room" is rarely the one deciding anything.

This also depends on your definition on "smartest".


> people who have never used the product and don't think like or accurately represent our users

I agree completely that these are the important qualifications to be setting direction for a product.

> Find your most socially competent engineer, and have them talk to users a couple times a month.

This doesn't necessarily follow from the above, but in Anthropic's case specifically, where the users are software engineers, it probably would have worked better than whatever they have going on now.

In general, it's probably better to have domain experts doing product management, as opposed to someone who is trained in product management.


> your most socially competent engineer

Unfortunately, he’s already two of our SEs and the CTO and we’re starting to run low on coders.

What are we going to do when we need a customer success manager or a profserv team?


Apple was built on a product manager shouting at engineers until they got it right.

Product managers are fooling themselves if they think they can "improve the user experience" for developers -- developers can't agree on the simplest things such as key bindings (vim, emacs) or identation (tabs, spaces).

Make the application configurable. Developers like to tinker with their tools.


> under the guise of 'improving the user experience' or perhaps minimalism

I think we can be more charitable. Don't you see, even here on HN, people constantly asking for software that is less bloated, that does fewer things but does them better, that code is cost, and every piece of complexity is something that needs to be maintained?

As features keep getting added, it is necessary to revisit where the UX is "too much" and so things need to be hidden, e.g. menu commands need to be grouped in a submenu, what was toolbar functionality now belongs in a dialog, reporting needs to be limited to a verbose mode, etc.

Obviously product teams get it wrong sometimes, users complain, and if enough users complain, then it's brought back, or a toggle to enable it.

There's nothing to be cynical about, and it's not something we "should be over by now." It's just humans doing their best to strike the balance between a UX that provides enough information to be useful without so much information that it overwhelms and distracts. Obviously any single instance isn't usually enough to overwhelm and distract, but in aggregate they do, so PM's and designers try to be vigilant to simplify wherever possible. But they're only human, sometimes they'll get it wrong (like maybe here), and then they fix it.


Every single website on the internet just says "whoopsie doodle, me made an oopsie" instead of just telling me what the problem is. This so-called mistake is so widespread that it has been the standard for at least a decade.

I agree it's a mistake, but I don't believe that it's viewed that way by anyone making the decision to do it.


You dont expose error details to the user for security reasons, even though it does indeed make the user experience worse.

I understand not exposing a full stack trace, but I don't see any excuse to not even expose a googleable error code. If me having an error code makes your product insecure, then you have a much bigger problem.

I show the stack trace on AGPL projects. Why hide what they can already see for themselves?

The reason I see is that it might expose the value of secret keys or other sensitive variables. But if you are certain it won't happen, then yes

This also shifts over time - new users, especially people sophisticated in the field your tool is addressing, need to be convinced the product is doing what they believe it should be doing, and want to see more output from it. They may become comfortable with the product over time and move further up the trust/abstraction ladder, but at the beginning, verbose output is a trust-building mechanism.

> Many people have been taught that streamlining and removal is positive.

Over the past ten years or so the increasing de-featuring of software under the guise of 'simplification' has become a critical issue for power users. For any GUI apps which have a mixed base of consumer and power users, I mostly don't update them anymore because they're as likely to get net worse vs better.

It's weird that companies like MSFT seem puzzled why so many users refuse to update Windows or Office to major new feature versions.


What in Office has been a degradation? Just curious. I mostly agree about Windows.

We are currently extremely blessed on the companies new product, because they have placed a curious and open-minded product manager and a curious and open-minded ux-designer in charge of the administrative interface. Over half a year, those two have gained the trust of several admins within the company, all of them with experience of more than 10 years.

We have by now taught them about good information density.

Like, the permission pages, if you look at them just once, kinda look like bad 90s UIs. They throw a crapton of information at you.

But they contain a lot of smart things you only realize when actually using it from an admin perspective. Easy comparison of group permissions by keeping sorting orders and colors stable, so you can toggle between groups and just visually match what's different, because colors change. Highlights of edge cases here and there. SSO information around there as well. Loads of frontloaded necessary info with optional information behind various places.

You can move seriously fast in that interface once you understand it.

Parts of the company hate it for not being user friendly. I just got a mail that a customer admin was able to setup SSO in 15 minutes and debug 2 mapping issues in another 10 and now they are production ready.


I am so glad to hear there are working PMs who are aware of this (and if you’re hiring it makes me more interested in considering your employer).

Also product manager here.

Not at all cynically, this is classic product management - simplify by removing information that is useful to some users but not others.

We shouldn't be over it by now. It's good to think carefully about how you're using space in your UI and what you're presenting to the user.

You're saying it's bad because they removed useful information, but then why isn't Anthropic's suggestion of using verbose mode a good solution? Presumably the answer is because in addition to containing useful information, it also clutters the UI with a bunch of information the user doesn't want.

Same thing's true here - there are people who want to see the level of detail that the author wants and others for whom it's not useful and just takes up space.

> It requires deep understanding of customer usage in order not to make this mistake.

It requires deep understanding of customer usage to know whether it's a mistake at all, though. Anthropic has a lot deeper understanding of the usage of Claude Code than you or I or the author. I can't say for sure that they're using that information well, but since you're a PM I have to imagine that there's been some time when you made a decision that some subset of users didn't like but was right for the product, because you had a better understanding of the full scope of usage by your entire userbase than they did. Why not at least entertain the idea that the same thing is true here?


Simplification can be good---but they've removed the wrong half here!

The notifications act as an overall progress bar and give you a general sense of what Claude Code is doing: is it looking in the relevant part of your codebase, or has it gotten distracted by some unused, vendored-in code?

"Read 2 files" is fine as a progress indicator but is too vague for anything else. "Read foo.cpp and bar.h" takes almost the same amount of visual space, but fulfills both purposes. You might want to fold long lists of files (5? 15?) but that seems like the perfect place for a user-settable option.


> "Read 2 files" is fine as a progress indicator but is too vague for anything else. "Read foo.cpp and bar.h" takes almost the same amount of visual space, but fulfills both purposes.

Now this is a good, thoughtful response! Totally agree that if you can convey more information using basically the same amount of space, that's likely a better solution regardless of who's using the product.


> It requires deep understanding of customer usage to know whether it's a mistake at all

Software developers like customizable tools.

That's why IDEs still have "vim keybindings" and many other options.

Your user is highly skilled - let him decide what he wants to see.


There are a lot of Claude Code users who aren't software developers. Maybe they've decided that group is the one they want to cater to? I recognize that won't be a popular decision with the HN crowd, but that doesn't mean it's the wrong one.

I fully agree with you on almost everything you wrote in this thread, but I’m not sure this is the right answer. I myself currently spend a lot of time with CC and belong to that group of developers who don’t care about this problem. It’s likely that I’m not alone. So it doesn’t have to be the least professional audience they serve with this update. It’s possible that Anthropic knows what are they doing (e.g. reducing level of detail to simplify task of finding something more important in the output) and it’s also possible that they are simply making stupid product decisions because they have a cowboy PM who attacks some OKR screaming yahoo. We don’t know. In the end having multiple verbosity levels configured with granularity similar to java loggers would be nice.

Oh totally - I'm definitely not saying that they made the decision to cater to non-dev users, just that it's a possibility. Totally agree with you that at the end of the day, we haven't the foggiest idea.

Yeah, I made a similar point about the tone of ChatGPT responses; to me, I can't imagine why someone would want less information when working and tuning an AI model. However, something tells me they actually have hard evidence that users respond better with less information regardless of what the loud minority say online, and are following that.

100%. Metrics don't lie. I've A/B tested this a lot. Attention is a rare commodity and users will zone out and leave your product. I really dislike this fact

Then why is the suggestion to use verbose mode treated as another mistake?

The user is highly skilled; let them filter out what is important

This should be better than adding an indeterminate number of toggles and settings, no?


does claude code let me control whats output when?

verbose i think puts it on the TUI and i cant particularly grep or sed on the TUI


Developer> This is important information and most developers want to see it.

PM1> Looks like a PM who is out of touch with what the developers want. Easy mistake to make.

PM2> Anthropic knows better than this developer. The developer is probably wrong.

I don't know for sure what the best decision is here, I've barely used CC. Neither does PM1 nor PM2, but PM2 is being awfully dismissive of the opinion of a user in the target audience. PM1 is probably putting a bit too much weight on Developer's opinion, but I fully agree with "All of us... have seen UIs where this has occurred." Yes, we have. I personally greatly appreciate a PM who listens and responds quickly to negative feedback on changes like this, especially "streamlining" and "reducing clutter" type changes since they're so easy to get wrong (as PM1 says).

> It's good to think carefully about how you're using space in your UI and what you're presenting to the user.

I agree. It's also good to have the humility to know that your subjective opinion as someone not in the target audience even if you're designing the product is less informed in many ways than that of your users.

----

Personally, I get creeped out by how many things CC is doing and tokens it's burning in the background. It has a strong "trust me bro" vibe that I dislike. That's probably common to all agent systems; I haven't used enough to know.


> PM2> Anthropic knows better than this developer. The developer is probably wrong.

Nope! Not what I said. I specifically said that I don't know if Anthropic is using the information they have well. Please at least have the courtesy not to misrepresent what I'm saying. There's plenty of room to criticize without doing that.

> It's also good to have the humility to know that your subjective opinion as someone not in the target audience even if you're designing the product is less informed in many ways than that of your users.

Ah, but you don't know I'm not the target audience. Claude Code is increasingly seeing non-developer users, and perhaps Anthropic has made a strategic decision to make the product friendlier to them, because they see that as a larger userbase to target?

I agree that it's important to have humility. Here's mine: I don't know why Anthropic made this decision. I know they have much more information than me about the product usage, its roadmap and their overall business strategy.

I understand that you may not like what they're doing here and that the lack of information creeps you out. That's totally valid. My point isn't that you're wrong to have that opinion, it's that folks here are wrong to assume that Anthropic made this decision because they don't understand what they're doing.


> Personally, I get creeped out by how many things CC is doing and tokens it's burning in the background. It has a strong "trust me bro" vibe that I dislike.

100% this.

It might be convenient to hide information from non-technical users; but software engineers need to know what is happening. If it is not visible by default, it should be configurable via dotfiles.


They know what people type into their tools, but they don't know what in the output users read and focus on unless they're convening a user study or focus group.

I personally love that the model tells me what file it has read because I know whether or not it's headed in the generally right direction that I intended. Anthropic has no way of knowing I feel this way.


But you have no idea if they've convened user study or focus groups, right?

I'll just reiterate my initial point that the author of the post and the people commenting here have no idea what information Anthropic is working with. I'm not saying they've made the right decision, but I am saying that people ought to give them the slightest bit of credit here instead of treating them like idiots.


> You're saying it's bad because they removed useful information, but then why isn't Anthropic's suggestion of using verbose mode a good solution?

Because reading through hundreds of lines verbose output is not a solution to the problem of "I used to be able to see _at a glance_ what files were being touched and what search patterns were being used but now I can't".


Right, I understand why people prefer this. The point was that the post I was responding to was making pretty broad claims about how removing information is bad but then ignoring the fact that they in fact prefer a solution that removes a lot of information.

I'm sure the goal is that reading files is something you debug, not monitor, like individual network requests in a browser.

Well, some who start as developers don't truly see users as stakeholders, sometimes not even remotely, and they often aren't assisted to change that view. While it feels astonishing in direct encounters, on the sliding scale of "are you a person that sees other people as stakeholders in general", many developers can be close to the "no" end of that scale. So not necessarily an institutional view.

This was a very bitter pill to swallow! It took me more than one mistake to learn this - "you are not the user".

I think it might also come down to UI churn. Sprint over? What to do next? Everything is always moving because people have nothing meaningful to do.

> Cynically, this is classic product management: simplify and remove useful information under the guise of 'improving the user experience' or perhaps minimalism if you're more overt about your influences.

Cynically, it's a vibe coded mess and the "programmers" at Anthropic can't figure out how to put it back.

More cynically, Anthropic management is trying to hide anything that people could map to token count (aka money) so that they can start jiggling the usage numbers to extract more money from us.


Fairly cynical indeed. Though I must admit that Anthropic's software - not the models, the software they build - seems to be generally plagued by quality issues. Even the dashboard is _somehow_ broken most of the time, at least whenever I try to do something.

I'm not usually very cynical, but either of those seems equally likely as what's going on here.

Or is this PM and executive management aiming for the no and low code users? That would fit the zeitgeist especially in the tech C level and their sales pitch to non-tech C levels.

Product management --and managers-- can be, shall we say, interesting.

I was recently involved with a company that wanted us to develop a product that would be disruptive enough to enter an established market, make waves and shock it.

We did just that. We ran a deep survey of all competing products, bought a bunch of them, studied absolutely everything about them, how they were used and their users. Armed with that information, we produced a set of specifications and user experience requirements that far exceeded anything in the market.

We got green-lit to deliver a set of prototypes to present at a trade show. We did that.

The prototypes were presented and they truly blew everyone away. Blogs, vlogs, users, everyone absolutely loved what we created and the sense was that this was a winning product.

And then came reality. Neither the product manager nor the CTO (and we could add the CEO and CFO to the list) had enough understanding and experience in the domain to take the prototypes to market. It would easily have required a year or two of learning before they could function in that domain.

What did they do? They dumbed down the product specification to force it into what they understood and what engineering building blocks they already had. Square peg solidly and violently pounded into a round hole.

The outcome? Oh, they built a product alright. They sure did. And it flopped, horribly flopped, as soon as it was introduced and made available. Nobody wanted it. It was not competitive. It offered nothing disruptive. It was a bad clone of everything already occupying space in that ecosystem. Game over.

The point is: Technology companies are not immune to human failings, ego, protectionism/turf guarding, bad decisions, bad management, etc.

When someone says something like "I am not sure that's a good idea for a startup. There's competition." My first though is: Never assume that competitors know what they are doing, are capable and always make the right decisions without making mistakes. You don't always need a better product, you need better execution.


Replace the C levels with AI. The C suite is am impediment to innovation and progress. They are the office politics mentioned in this entire thread. The person with the vision and the strategy is a random person out there that doesn't even work for your company. Hell, you could have done it.

> The point is: Technology companies are not immune to human failings, ego, protectionism/turf guarding, bad decisions, bad management, etc.

They only accidentally succeed in spite of those things. They have those things more than existing businesses precisely because having too much money masks the pressures that would force solid execution and results. When you have 80% profit margins, you can show up drunk.


Paranoid schizo here: This actually seems to be a way to flag users into whales and fishes, and to treat the whales as marks to dumb down their entire needs of LLM value.

People who toggle debug will get "full" access and those who dont care, probably won't notice if their LLM us is degraded.

It seems a pure market segmenting prior to a "shrinkflation" approach to cost management.


Product managers aren’t needed anymore.

First they came for the product managers, and I said nothing, because I was a coder, and we're invincible and can do everything and also deliver value unlike all those other slackers, so they'd never come for us.

lol product managers have for decades been a redundant role. Customer services serve a better purpose and need than a product manager. But sure blame AI.

https://github.com/anthropics/claude-code/issues/8477

https://github.com/anthropics/claude-code/issues/15263

https://github.com/anthropics/claude-code/issues/9099

https://github.com/anthropics/claude-code/issues/8371

It's very clear that Anthropic doesn't really want to expose the secret sauce to end users. I have to patch Claude every release to bring this functionality back.


I just assume that they realized that they can split the offering, and to charge for the top tier more. (Yes, even more.)

If Claude Code can replace an engineer, it should cost just a bit less than an engineer, not half as much.


But then you pay for the less outrageously subsidized rates of API instead of the a bit less incredibly generous prices of the subscription.

Its not subsidized, in fact, they probably have very healthy margins on Claude Code.

Yeah. If you ignore the negligible fact that some investor may want a return on all that money that is going into capex I am pretty sure you can, Enron style, get to the conclusion that any of those companies have “healthy” margins.

Why do you think that?

DeepSeek had a theoretical profit margin of 545 % [1] with much inferior GPUs at 1/60th the API price.

Anthropic's Opus 4.6 is a bit bigger, but they'd have to be insanely incompetent to not make a profit on inference.

[1] https://github.com/deepseek-ai/open-infra-index/blob/main/20...


Deepseek lies about costs systematically. This is just another fantasy.


American labs trained in a different way than the Chinese labs. They might be making profit on inference but they are burning money otherwise.

> they'd have to be insanely incompetent to not make a profit on inference.

Are you aware of how many years Amazon didn’t turn a profit?

Not agreeing with the tactic - just…are you aware of it?


Amazon was founded in 1994, went public in 1997 and became profitable in 2001. So Anthropic is two years behind with the IPO but who knows, maybe they'll be profitable by 2028? OpenAI is even more behind schedule.

How much loss did they accumulate until 2001? Pretty sure it wasn't the 44 billion OpenAI has. And Amazon didn't have many direct competitors offering the same services.

Because if you don't then current valuations are a bublle propped inflated by burning a mountain of cash.

That's not how valuations work. A company's valuation is typically based on an NPV (net present value) calculation, which is a power series of its time-discounted future cash flows. Depending on the company's strategy, it's often rational for it to not be profitable for quite a long while, as long as it can give investors the expectation of significant profitability down the line.

Having said that, I do think that there is an investment bubble in AI, but am just arguing that you're not looking at the right signal.


And that's OpenAI's biz model? :)

Remember there are no moats in this industry - if anything one company might have a 2 month lead, sometimes. We've also noticed that companies paying OpenAI may swiftly shift to paying Google or Anthropic in a heartbeat.

That means the pricing is going to be competitive. You may still get your wish though, but instead of the price of an engineer remaining the same, it will cut itself down by 95%.


I don't know about you, but I benefit so much from using Claude at work that I would gladly pay $80,000-$120,000 per year to keep using it.

Why would you gladly pay more than what it's worth? It's not an engineer you are hiring, it's AI. The whole point of it was to make intelligent workflows cheaper. If it's going to cost as much as an engineer, hire the engineer, at least you'd have an escape goat when things invariably go wrong.

> an escape goat

Autocorrect hall of famer, there.


Scapegoat, got it. Can't blame the autocorrect though... I honestly thought it was spelled like that, which is a shame since I've been studying English my entire life as a second language.

Etymologically speaking, "scapegoat" does mean "escaped goat" so it's not a crazy mistake to make.


At least that misunderstanding didn’t cause a nuclear accident: https://practical.engineering/blog/2025/4/15/when-kitty-litt...

Luckily these strayed goats weren't irradiated

There's a name for this sort of phenomenon...

https://eggcorns.lascribe.net/english/242/escape-goat/


I agree with you, I was just joking.

Oh now I see... Joke's on me then I guess :D

It wasn't clear to me that this was a joke either. I assume the same for others since the post is grayed out.

Oh come on. That pays for more than 10 fte in some countries

I made this joke with "$1,500-$2000 per month" last night and everyone thought I was serious

I know people who burned several hundreds a day and still were finding it worth it.

Were they actually making money though? A lot of the people on the forefront of this AI stuff seem like cult leaders and crackheads to me.

I'd pay up to $1000 pretty easily just based off the time it saves me personally from a lot of grindy type work which frees me up for more high value stuff.

It's not 10x by any means but it doesn't need to be at most dev salaries to pay for itself. 1.5x alone is probably enough of an improvement for most >jr developers for a company to justify $1000/month.

I suppose if your area of responsibility wasn't very broad the value would decrease pretty quickly so maybe less value for people at very large companies?


I can see $200 but $1,000 per month seems crazy to me.

Using Claude Code for one year is worth the same as a used sedan (I.E., ~$12,000) to you?

You could be investing that money!


Yes, easily. Paying for Claude would be investing that money. Assuming 10% return which would be great I'd make an extra $1200 a year investing it. I'm pretty sure over the course of a year of not having to spend time doing low value or repetitive work I can increase productivity enough to more than cover the $13k difference. Developer work scales really well so removing a bunch of the low end and freeing up time for the more difficult problems is going to return a lot of value.

I would probably pay $2000 a month if I had to - it's a small fraction of my salary, and the productivity boost is worth it.

It's *worth it* when you're salaried? Compared to investing the money? Do you plan to land a very-high-paying executive role years down the line? Are you already extremely highly paid? Did Claude legitimately 10x your productivity?

edit: Fuck I'm getting trolled


I'm serious - the productivity boost I'm getting from using AI models is so significant, that it's absolutely worth paying even 2k/month. It saves me a lot of time, and enables me to deliver new features much faster (making me look better for my employer) - both of which would justify spending a small fraction of my own money. I don't have to, because my employer pays for it, but as I said, if I had to, I would pay.

I am not paying this myself, but the place I work at is definitely paying around 2k a month for my Claude Code usage. I pay 2 x 200, for my personal projects.

I think personal subs are subsidized while corporate ones definitely not. I have CC for my personal projects running 16h a day with multiple instances, but work CC still racks way higher bills with less usage. If I had to guess my work CC is using 4x as little for 5x the cost so at least 20x difference.

I am not going to say it has 10xed or whatever with my productivity, but I would have never ever in that timeframe built all those things that I have now.


I don't know why you keep insisting that no one is making any money off of this. Claude Code has made me outrageously more productive. Time = Money right?

What do you use it for, do you have example? For you to be ok with paying 80k to 120k I'm guessing its making you 375-450k a year?

I'm joking, my point is that it's already quite expensive and I don't think it's making anyone money.

that means customers will pay minimum 2x that much I think

STFU right now because the more you bring this up the more likely it'll happen.

Similarly, STFU about the stuff that can give LLMs ideas for how to harm us (you know what I'm talking about, it's reptilian based)

The whole comment thread is likely to have been read by some folks at Anthropic. Already a disaster. Just keep on with the "we hate it unless it gets cheaper" discourse please!!!


Patching's not long for this world; Claude Code has moved to binary releases. Soon, the NPM release will just be a thin wrapper around the binary.

the binary is just bun. I wrote this to inspect CC

https://github.com/shepherdjerred/monorepo/tree/main/package...


Where there's a will, there's a way

> It's very clear that Anthropic doesn't really want to expose the secret sauce to end users

Meanwhile, I am observing precisely how VS+Copilot works in my OAI logs with zero friction. Plug in your own API key and you can MITM everything via the provider's logging features.


> Plug in your own API key

I checked with ccusage (a cli tool that checks how much your Claude subscription tokens would have cost via the API).

My $200 a month subscription would have cost me more than $3000. The highest single day would have cost more than $300.

Gemini is cheaper, but not by much.


> to end users

To other actors who want to train a distilled version of Claude, more likely.


Honestly, just use OpenCode. It works with Claude Code Max, and the TUI is 100x better. The only thing that sucks is Compaction.

How much longer is Anthropic going to allow OpenCode to use Pro/Max subscriptions? Yes, it's technically possible, but it's against Anthropic's ToS. [1]

1: https://blog.devgenius.io/you-might-be-breaking-claudes-tos-...


Consider switching to an OpenAI subscription, which allows OpenCode use.

Yeah. OpenAI allows any client, and only one single fixed system prompt. All their control is on the backend, which is worse than Claude.

Doesn’t Claude code have an agents sdk that officially allows you to use the good parts?

Yes but you can't use a subscription with that

There are also Azure versions of Opus

I have been unable to use OpenCode with my Claude Max subscription. It worked for awhile, but then it seems like Anthropic started blocking it.

What’s 100x better about the TUI?

Nope, OpenCode is nowhere near Claude Code.

It's amazing how much other agentic tools suck in comparison to Claude Code. I'd love to have a proper alternative. But they all suck. I keep trying them every few months and keep running back to Claude Code.

Just yesterday I installed Cursor and Codex, and removed both after a few hours.

Cursor disrespected my setting to ask before editing files. Codex renamed my tabs after I had named them. It also went ahead and edited a bunch of my files after a fresh install without asking me. The heck, the default behavior should have been to seek permission at least the first time.

OpenCode does not allow me to scrollback and edit a prior prompt for reuse. It also keeps throwing up all kinds of weird errors, especially when I'm trying to use free or lower cost models.

Gemini CLI reads strange Python files when I'm working on a Node.js project, what the heck. It also never fixed the diff display issues in the terminal; It's always so difficult for me to actually see what edits it is actually trying to make before it makes it. It also frequently throws random internal errors.

At this point, I'm not sure we'll be seeing a proper competitor to Claude Code anytime soon.


Hmmm, I used OpenCode for awhile and didn't have this experience. I felt like OpenCode was the better experience.

Same, I still use CC mainly due to it being so wildly better at compaction. The overall experience of using OpenCode was far superior - especially with the LSP configured.

I use Opencode as my main driver, and I don’t experience what you have experienced.

For instance, opencode has /undo command which allows you to scroll back and edit a prior prompt. It also support forking conversations based on any prior message.

I think it depends on the set up. I overwrote the default planning agent prompt of opencode to fit my own use cases and my own mcp servers. I’ve been using OpenAI’s gpt codex models and they have been performing very well and I am able to make it do exactly what I ask it to do.

Claude code may do stuff fast, but in terms of quality and the ability to edit only what I want it to do, I don’t think it’s the best. Claude code often take shortcuts or do extra stuff that I didn’t ask.


5.3 Codex on cursor is better than Claude code

Not in my (limited) experience. I gave CC and codex detailed instructions for reworking a UI, and codex did a much worse job and took 5x as long to finish.

If they cared about that, they wouldn't expose the thinking blocks to the end-user client in the first place; they'd have the user-side context store hashes to the blocks (stored server-side) instead.

I don't suppose you could share a little on that patching process?

More likely 99.9% of users never press ctrl+o to see the thinking, so they don't consider it important enough to make a setting out of.

To be fair they have like 10,000 open issues / spam issues, it's probably insane out there for them to filter all of it haha

GitHub Issues as a customer support funnel is horrible. It's easy for them, but it hides all the important bugs and only surfaces "wanted features" that are thumbs-up'd alot. So you see "Highlight text X" as the top requested feature; meanwhile, 10% of users experience a critical bug, but they don't all find "the github issue" one user poorly wrote about it, so it has like 7 upvotes.

GitHub Codespaces has a critical bug that makes the copilot terminal integration unusable after 1 prompt, but the company has no idea, because there is no clear way to report it from the product, no customer support funnel, etc. There's 10 upvotes on a poorly-written sorta-related GH issue and no company response. People are paying for this feature and it's just broken.


Humans don't look at these anymore, Claude itself does. They've even said so.

Maybe they can use AI to figure out which ones are actually useful and which ones are not.

I thought the source code for the actual CLI was closed source. How are you patching it?

Claude code can reverse engineer it to a degree. Doing it for more than a single version is a PITA though. Easier to build you own client over their SDK.

I think it's more classic enshittification. Currently, as a percentage, still not many devs use it. In a few months or 1-2 years all these products will start to cater to the median developer and start to get dumbed down.

I’m a heavy Claude code user and it’s pretty clear they’re starting to bend under their vibe coding. Each Claude code update breaks a ton of stuff, has perf issues, etc.

And then this. They want to own your dev workflow and for some reason believe Claude code is special enough to be closed source. The react TUI is kinda a nightmare to deal with I bet.

I will say, very happy with the improvements made to Codex 5.3. I’ve been spending A LOT more time with codex and the entire agent toolchain is OSS.

Not sure what anthropic’s plan is, but I haven’t been a fan of their moves in the past month and a half.


Same, codex 5.3 was able to solve a problem that I personally was stuck on even with help from Claude for the last 2 weeks.

Yeah, I can feel it too, it _mostly_ works but.. feels like it needs a rewrite.

for example Amp "feels" much better. Also like in Amp how I can just send the message whenever and it doesn't get queued

* I know, lots of "feels" in there..


I switched to Codex 5.3 too, it's cheaper also anyway and as dumb as it sounds, Scam Altman is actually the less annoying CEO compared to Amodei which is kind of an achievement. Amodei really looking more and more like some huckster giving these idiotic predictions to the press.


So he's been accused of various crimes and has not been not found guilty?

Not like Epstein at all then.


OpenAI’s president is a Trump mega-donor

https://news.ycombinator.com/item?id=46771231


>Of all tyrannies, a tyranny sincerely exercised for the good of its victims may be the most oppressive. It would be better to live under robber barons than under omnipotent moral busybodies. The robber baron's cruelty may sometimes sleep, his cupidity may at some point be satiated; but those who torment us for our own good will torment us without end for they do so with the approval of their own conscience. They may be more likely to go to Heaven yet at the same time likelier to make a Hell of earth. This very kindness stings with intolerable insult. To be "cured" against one's will and cured of states which we may not regard as disease is to be put on a level of those who have not yet reached the age of reason or those who never will; to be classed with infants, imbeciles, and domestic animals.

Sam wants money. Dario wants to be your dad.

I'm going with Sam.


The article is about Greg Brockman, president of OpenAI.

Codex has been useless for me on standard Plus plan unfortunately. Actually thoroughly disappointed. And VS code integration is totally broken.

I'm not sure why I'm getting downvoted, but VS Code integration really does stink. Often times it will just simply not send the API request and just say reconnecting and I've had it simply freeze where the VS Code OpenAI Codex plugin has frozen, but all the other plugins like Cline or Roo are working perfectly fine. So the VS Code integration is almost unusable in my experience.

Claude's brand is sliding dangerously close to "the Microsoft of AI."

DEVELOPERS, DEVELOPERS, DEVELOPERS, DEVELOPERS

I write mainly out of the hope that some Anthropic employees read this: you need an internal crusade to fight these impulses. Take the high road in the short-term and you may avoid being disrupted in the long-term. It's a culture issue.

Probably your strongest tool is specifically educating people about the history. Microsoft in the late 90s and early 00s was completely dominant, but from today's perspective it's very clear: they made some fundamental choices that didn't age well. As a result, DX on Windows is still not great, even if Visual Studio has the best features, and people with taste by and large prefer Linux.

Apple made an extremely strategic choice: rebuild the OS around BSD, which set them up to align with Linux (the language of servers). The question is: why? Go find out.

The difference is a matter of sensibility, and a matter of allowing that sensibility to exist and flourish in the business.


> you need an internal crusade to fight these impulses. Take the high road in the short-term...

Anthropic is the market leader for advanced AI coding with no serious competitor currently very close and they are preparing to IPO this year. This year is a transition year. The period where every decision would default toward delighting users and increasing perceived value is ending. By next year they'll be fully on the quarterly Wall Street grind of min/maxing every decision to extract the highest possible profit from customers at the lowest possible cost.

This path is inevitable and unavoidable, even with the most well-intentioned management and employees.


I'm specifically trying to counter this kind of defeatism. Individual employees can and do make a difference.

The thing that annoys me most of all is they block me from using OpenCode with my Claude Max plan. I find the OpenCode UI to be meaningfully better than Claude Code's, so this is really annoying.

Some workarounds are here https://github.com/anomalyco/opencode/issues/7410 but I agree with you, this should be a native feature.

if you are an expert developer smarter than everyone at anthropic, like everyone else commenting on this post, you'll know that it's not difficult to use the claude agent sdk behind an api to achieve almost exactly the same thing

Huh? Why wouldn’t developers (who probably have stock options in Claude) try to prevent becoming 'the Microsoft of AI'? That's probably what they are actively trying to do.

This take is overly cynical. Every major corporation has people with influence who care and fight for good outcomes. They win some fights, they lose others. The only evidence you need is to notice the needlessly good decisions that were made in the past.

Some greatest hits:

- CoreAudio, Mac OS memory management, kernel in general, and many other decisions

- Google's internal dev tooling, Go, and Chrome (at least, in its day)

- C#, .NET, and Typescript (even Microsoft does good work)

One of the hallmarks of heroic engineering work is that everyone takes it for granted afterward. Open source browsers that work, audio that just works, successors to C/C++ with actual support and adoption, operating systems that respond gracefully under load, etc. ... none of these things were guaranteed, or directly aligned with short-term financial incentives. Now, we just assume they're a requirement.

Part of the "sensibility" I'm talking about is seeking to build things that are so boring and reliable that nobody notices them anymore.


Your incentive is to stay in the job so you can vest. Fighting the slide may just make enemies

I'm old, so I remember when Skyrim came out. At the time, people were howling about how "dumbed down" the RPG had become compared to previous versions. They had simplified so many systems. Seemed to work out for them overall.

I understand the article writers frustration. He liked a thing about a product he uses and they changed the product. He is feeling angry and he is expressing that anger and others are sharing in that.

And I'm part of another group of people. I would notice the files being searched without too much interest. Since I pay a monthly rate, I don't care about optimizing tokens. I only care about the quality of the final output.

I think the larger issue is that programmers are feeling like we are losing control. At first we're like, I'll let it auto-complete but no more. Then it was, I'll let it scaffold a project but not more. Each step we are ceding ground. It is strange to watch someone finally break on "They removed the names of the files the agent was operating on". Of all of the lost points of control this one seems so trivial. But every camels back has a breaking point and we can't judge the straw that does it.


If you're paying a monthly rate you still have to optimize for tokens, otherwise you'll be rate limited.

And not just by the day! The weekly limits are the biggest mistake imaginable for maintaining user engagement on a project.

That is a very insightful point. It highlights the irony of complaining about "loss of control" immediately after voluntarily inviting an autonomous agent into the codebase.

Those specific logs are essentially a prop anyways. Removing them makes it harder to LARP as an active participant; it forces the realization that "we" are now just passive observers.


They have a dedicated product called Co-work for non-technical people. Claude Code is a *coding* tool (it's in the name) and anthropic has made decisions to thoroughly annoy a lot of the users.

> Seemed to work out for them overall.

I'm guessing you're not aware of how their newest game, Starfield, was received. In the long term, that direction did not work out for them at all.


Skyrim is one of the most over-rated games of all time. Dark Messiah Might and Magic did everything except music and exploration/scale better, and I mean a LOT better. It's from 2006.

https://www.youtube.com/watch?v=-p3zj0YKKYE

https://www.youtube.com/watch?v=yeRUHzYJwNE


> Skyrim is one of the most over-rated games of all time.

Those are fightin’ words as someone who has dumped more hours than I can count into Skyrim but…

I had never heard of this game, but it has a lot going for it (source engine) and I watched a little of the gameplay you linked and I’m intrigued. I’m probably gonna pick this up for the steam deck.

A friend recommended the Might and Magic games to me a long time ago and I bought them off GoG, but wasn’t a fan of the gameplay and just couldn’t get hooked. This looks very different from what I remember (probably because this is a very different game from the earlier ones).

Thank you for mentioning this game!


There are a lot of non developer claude code users these days. The hype about vibe coding lets everyone think they can now be an engineer. Problem is if anthropic caters to that crowd the devs that are using it to do somewhat serious engineering tasks and don't believe in the "run an army of parallel agents and pray" methodology are being alienated.

Maybe Claude Code web or desktop could be targeted to these new vibe coders instead? These folks often don't know how simple bash commands work so the terminal is the wrong UX anyway. Bash as a tool is just very powerful for any agentic experience.


It’s funny because on one end of the spectrum you have non dev vibe coders for whom every log is noise

On the other end are the hardcore user orchestrating a bunch of agents, not sitting there watching one run, so they don’t care about these logs at all

In the middle are the engineers sitting there watching the agent go


Logs (and in this case, Verbose Mode) aren't for knowing what a thing is currently doing as its doing it, it's for finding out what happened when the thing didn't do what you expected or wanted.

The non dev vibe coders are probably a bigger group of users, and therefore equal more money. Change justified...

The others are also paying. Make it configurable...

If 80% of their paying customers are vibe coders then it makes sense to make IDE “easy” for them. “Hey, Claude, make a website. Don’t make mistakes.”

Or, it could serve as a textbook example how to make your real future long term customers (=fluent coders) angry… what a strategy :)


Microsoft fell into this trap in the 90s -- they believed that they could hide the DOS prompt, and make everything "easier" with wizards where you just go through a series of screens clicking "next", "next", "finish".

Yes, it was easier. But it dumbed down a generation of developers.

It took them two decades to try to come up with Powershell, but it was too late.


Exactly how I feel. I'm happy that more people are using these tools and learning (hopefully) about engineering but it shouldn't degrade the core experience for let's say "more advanced" users who don't see themselves as Vibe coders and want precise control over what's happening.

> learning (hopefully) about engineering

Not a chance.

If anything, the reverse, in that it devalues engineering. For most, LLMs are a path to an end-product without the bother or effort of understanding. No different than paid engineers were, but even better because you don't have to talk to engineers or pay them.

The sparks of genuine curiosity here are a rounding error.


If I give pupils the solution book will they learn or just copy the answers?

There is a reason why nowadays games start to help massively if the player gets stuck.


"There is a reason why nowadays games start to help massively if the player gets stuck"

You mean those "free" games, that are hard and grindy by design and the offered help comes in the shape of payed perks to solve the challenges?


No, those paid games where NPCs starts to point to clues if the player takes too long to solve a riddle or where you can skip the hard parts if you fail to often.

> The hype about vibe coding lets everyone think they can now be an engineer.

Programmers are just jealous that they are no longer the only ones that get to play pretend.

I don't know anything about you personally, but most "software engineers" are anything but.


I am curious to know why you believe that?

I've worked as a software engineer with different types of engineers (electrical, mechanical and automation).

Their testing is often more strict but that is a natural consequence of their products being significantly harder to fix in the field than a software product is.

Other than that, my experience is that our way of working on projects across disciplines is very similar.


Run an army of parallel agents is orders of magnitude more profit per human, so they will tend to steer you towards that.

I think Dario & crew are getting high on their own supply and really believe the "software developers out of work by end of 2026" pronouncements.

Meanwhile all evidence is that the true value of these tools is in their ability to augment & super-charge competent software engineers, not replace them.

Meanwhile the quality of Claude Code the tool itself is a bit of a damning indictment of their philosophy.

Give me a team of experienced sharp diligent engineers with these coding tools and we can make absolutely amazing things. But newbie product manager with no software engineering fundamentals issuing prompts will make a mess.

I can see it even in my own work -- when I venture into doing frontend eng using these tools the results look good but often have reliability issues. Because my background/specialization is in systems, embedded & backend work -- I'm not good at reviewing the React etc code it makes.


Amodei has to be the most insufferable of all the AI hucksters, nowadays even Altman looks tame compared to him.

The whole company also has this meme about AI safety and some sort of fear-mongering about the models every few months. It's basically a smokescreen for normies and other midwits to make it look more mysterious and advanced than it really is. OOOOH IT'S GOING TO BREAK OUT! IT KNOWS IT'S BEING EVALUATED!

I bet there are some true believers in Anthropic too, people who think themselves too smart to believe in God so they replaced it with AI instead but all the same hopes are there, eg. Amodei preaching about AI doubling the human lifespan. In religion we usually talk about heaven.


Just 1 more data center build, man! A few more megawatts and double the context window and it's AGI!

I just want useful tools.


[flagged]


I've seen real gains in productivity using it. Nowhere near the 10x some people are promising, though, let alone replacing me.

don’t worry bro in 6 months it will replace all devs

just 6 months more and like $200B in capex and we’ll be there, trust the process


Anecdotally, all the non-technical people I know are adapting fine to the console. You don’t need to know how bash commands work to use it as you are just approving commands, not writing them.

Approving commands you don't understand doesn't seem ideal

People are handing over their entire system to openclaw, so that's about where we are.

Because we haven't heard about the disaster stories yet, give it some time and see how people will talk about it as if it were a virus.

And even if there are lots of vibe coders who don’t like/need the information then make it a toggle for those who want/need it

All my information about this is being based on feels, because debugging isn't really feasible. Verbose mode is a mess, and there's no alternative.

It still does what I need so I'm okay with it, but I'm also on the $20 plan so it's not that big of a worry for me.

I did sense that the big wave of companies is hitting Anthropic's wallet. If you hadn't realized, a LOT of companies switched to Claude. No idea why, and this is coming from someone who loves Claude Code.

Anyway, getting some transparency on this would be nice.


> If you hadn't realized, a LOT of companies switched to Claude. No idea why, and this is coming from someone who loves Claude Code.

It is entirely due to Opus 4.5 being an inflection point codingwise over previous LLMs. Most of the buzz there has been organic word of mouth due to how strong it is.

Opus 4.5 is expensive to put it mildly, which makes Claude Code more compelling. But even now, token providers like Openrouter have Opus 4.5 as one of its most popular models despite the price.


Everyone and I mean everyone keeps parroting this "inflection point" marketing hype, which is so damn tiring.

Believe me, I wish it was just parroting.

The real annoying thing about Opus 4.5 is that it's impossible to publicly say "Opus 4.5 is an order of magnitude better than coding LLMs released just months before it" without sounding like a AI hype booster clickbaiting, but it's the counterintuitive truth, to my personal frustration.

I have been trying to break this damn model since its November release by giving it complex and seemingly impossible coding tasks but this asshole keeps doing them correctly. GPT-5.3-Codex has been the same relative to GPT-5.2-Codex, which just makes me even more frustrated.


Weird, I broke Opus 4.5 pretty easily by giving some code, a build system, and integration tests that demonstrate the bug.

CC confidently iterated until it discovered the issue. CC confidently communicated exactly what the bug was, a detailed step-by-step deep dive into all the sections of the code that contributed to it. CC confidently suggested a fix that it then implemented. CC declared victory after 10 minutes!

The bug was still there.

I’m willing to admit I might be “holding it wrong”. I’ve had some successes and failures.

It’s all very impressive, but I still have yet to see how people are consistently getting CC to work for hours on end to produce good work. That still feels far fetched to me.


I don't know how to say this but either you haven't written any complex code or your definition of complex and impossible is not the same as mine, or you are "ai hyper booster clickbaiting" (your words).

It strains belief that anyone working on a moderate to large project would not have hit the edge cases and issues. Every other day I discover and have to fix a bug that was introduced by Claude/Codex previously (something implement just slightly incorrect or with just a slightly wrong expectation).

Every engineer I know working "mid-to-hard" problems (FANG and FANG adjacent) has broken every LLM including Opus 4.6, Gemini 3 Pro, and GPT-5.2-Codex on routine tasks. Granted the models have a very high success rate nowadays but they fail in strange ways and if you're well versed in your domain, these are easy to spot.

Granted I guess if you're just saying "build this" and using "it runs and looks fine" as the benchmark then OK.

All this is not to say Opus 4.5/6 are bad, not by a long shot, but your statement is difficult to parse as someone who's been coding a very long time and uses these agents daily. They're awesome but myopic.


I resent your implication that I am baselessly hyping. I've open sourced a few Opus 4.5-coded projects (https://news.ycombinator.com/item?id=46543359) (https://news.ycombinator.com/item?id=46682115) that while not moderate-to-large projects, are very niche and novel without much if any prior art. The prompts I used are included with each those projects: they did not "run and look fine" on first run, and were refined just as with normal software engineering pipelines.

You might argue I'm No True Engineer because these aren't serious projects but I'd argue most successful uses of agentic coding aren't by FANG coders.


First, very cool! Thank you for sharing some actual projects with the prompts logged.

I think you and I have different definitions of “one-shotting”. If the model has to be steered, I don’t consider that a one-shot.

And you clearly “broke” the model a few times based on your prompt log where the model was unable to solve the problem given with the spec.

Honestly, your experience in these repos matches my daily experience with these models almost exactly.

I want to see good/interesting work where the model is going off and doing its thing for multiple hours without supervision.


> I want to see good/interesting work where the model is going off and doing its thing for multiple hours without supervision.

I'd be hesitant to use that as a way to evaluate things. Different systems run at different speeds. I want to see how much it can get done before it breaks, in different scenarios.


I never claimed Opus 4.5 can one-shot things? Even human-written software takes a few iterations to add/polish new features as they come to mind.

> And you clearly “broke” the model a few times based on your prompt log where the model was unable to solve the problem given with the spec.

That's less due to the model being wrong and more due to myself not knowing what I wanted because I am definitely not a UI/UX person. See my reply in the sibling thread.


Wait, are you really saying you have never had Opus 4.5 fail at a programming task you've given it? That strains credulity somewhat... and would certainly contribute to people believing you're exaggerating/hyping up Opus 4.5 beyond what can be reasonably supported.

Also, "order of magnitude better" is such plainly obvious exaggeration it does call your objectivity into question about Opus 4.5 vs. previous models and/or the competition.


Opus 4.5 does made mistakes but I've found that's more due to ambiguous/imprecise functional requirements on my end rather than an inherent flaw of the agent pipeline. Giving it more clear instructions to reduce said ambiguity almost always fixes it, so I do not consider Opus failing. One of the very few times Opus 4.5 got completely stuck was, after tracing, an issue in a dependency's library which inherently can't be fixed on my end.

I am someone who has spent a lot of time with Sonnet 4.5 before that and was a very outspoken skeptic of agentic coding (https://news.ycombinator.com/item?id=43897320) until I gave Opus 4.5 a fair shake.


It still cannot solve a synchronization issue in my fairly simple online game, completely wrong analysis back to back and solutions that actually make the problem worse. Most training data is probably react slop so it struggles with this type of stuff.

But I have to give it to Amodei and his goons in the media, their marketing is top notch. Fear-mongering targeted to normies about the model knowing it is being evaluated and other sort of preaching to the developers.


But I used to be a skeptic but now in the last month

Yes, as all of modern politics illustrates, once one has staked out a position on an issue it is far more important to stick to one's guns regardless of observations rather than update based on evidence.

I will change my mind on this in the next month.

Not hype. Opus 4.5 is actually useful to one-shot things from detailed prompts for documentation creation, it's actually functional for generating code in a meaningful way. Unfortunately it's been nerfed, and Opus 4.6 is clearly worse from my few days of working with it since release.

The use of inflection point in the entire software industry is so annoying and cringy. It's never used correctly, it's not even used correctly in the Claude post everyone is referencing.

What euphemism better describes the trend?

If it's a trend, there's not an inflection point. The inflection point would be a point where the trend breaks.

step function

No, I just think that timing wise it finally made it through everyone’s procurement process.

I can't watch a YouTube video without seeing a Claude ad. Same for friends. Safe for non-programmer friends.

The below remark is unrelated to the main topic of this thread.

Why would you even watch a YouTube video with ads?

There are ad blockers, sponsor segment blockers, etc. If you use them, it will block almost every kind of YouTube ad.


all the ad blockers I used to use stop working, and it became an annoying game of cat and mouse that I didn't have time for. Luckily, most of the time I can "skip" the ad in like five seconds, and it gives me a moment to catch up on incoming Slack messages.

The only ad blocker I have used for the past couple of years has been uBlock Origin, more recently combined with SponsorBlock.

There has been two or three instances that I can remember when it did not block YouTube ads correctly for a couple of days. But those were quickly patched and it started to work again.


When has ublock origin browser extension ever stopped working? On a locked down mobile OS like iOS you can use the Brave browser. No cat and mouse game.

There are ad extensions that just turn those 5 second ads into like 200 ms ads. They just speed them up, it's great. Looks like a random flicker.

I used to use ad blockers.

One day I visited DistroWatch.com. The site deliberately tweaked its images so ad blockers would block some "good" images. It took me awhile to figure out what was going on. The site freely admitted what it was doing. The site's point was: you're looking at my site, which I provide for free, yet you block the thing that lets me pay for the site?

I stopped using ad blockers after that. If a site has content worth paying for, I pay. If it is a horrible ad-infested hole, I don't visit it at all. Otherwise, I load ads.

Which overall means I pay for more things and visit less crap things and just visit less things period. Which is good.


> If a site has content worth paying for, I pay.

I do that as well. For me it is almost exclusively the case with the news sites.

> If it is a horrible ad-infested hole, I don't visit it at all.

Same.

> Otherwise, I load ads.

There is no "otherwise" for me. I simply do not want to load any kind of ads or "sponsored" content. I see no reason, either moral, ethical or other, to ever do that.


Not safe, before even knowing if a site has the content you want you can be redirected to malware through ad networks

not even joking


On an up to date Safari on Mac, not a realistic concern, and if it were, I’d use security software, not an ad blocker.

0 days exist and they are exploited in the wild sometimes

An ad-blocker /is/ security software. You don’t have to take it from me, you can read from the Cybersecurity and Infrastructure Security Agency

> AT-A-GLANCE RECOMMENDATIONS

> Standardize and Secure Web Browsers

> Deploy Advertisement Blocking Software

> Isolate Web Browsers from Operating Systems

> Implement Protective Domain Name System Technologies

Literally their second recommendation on this pamphlet about securing web browsers: https://www.cisa.gov/sites/default/files/publications/Capaci...

Moreover you don’t even need a 0-day to fall for phishing. All you need is to be a little tired or somehow not paying attention (inb4 “it will never happen to ME, I am too smart for that”)


Do you not might ad companies tracking everything you do?

At $JOB IT actually bundles uBlock in all the browsers available to us, as per CIA (or one of those 3-letter agencies, might've even been the NSA) guidelines it's a very important security tool. I work in banking.

Modern advertisement is malware.


They have insane marketing push, across HN and reddit too btw.

NFT moment :) Where did it end btw?

I can. I use brave

> and there's no alternative.

Use the pi coding agent. Bare-bones context, easy to hack.


[flagged]


This has to be a bot account, right? 2 days old.

Yesterday "I don't know about you, but I benefit so much from using Claude at work that I would gladly pay $1,500-$2,000 per month to keep using it."


Agreed, those comments are all over the map, and so many comments in 2 days!

Agreed, those comments are all over the map, and 22 comments in 2 days!

Bots don't write like me

> FWIW I think LLMs are a dead end for software development

Thanks for that, and it's worth nothing FYI.

LLMs are probably the most impressive machine made in recorded human existence. Will there be a better machine? I'm 100% confident there will be, but this is without a doubt extremely valuable for a wide array of fields, including software development. Anyone claiming otherwise is just pretending at this point, maybe out of fear and/or hope, but it's a distorted view of reality.


> FWIW I think LLMs are a dead end for software development, and that the people who think otherwise are exceptionally gullible.

By this do you mean there isn't much more room for future improvement, or that you feel it is not useful in its current form for software development? I think the latter is hard position to defend, speaking as a user of it. I am definitely more productive with it now, although I'm not sure I enjoy software development as much anymore (but that is a different topic)


> By this do you mean there isn't much more room for future improvement

I don't expect that LLM technology will improve in a way that makes it significantly better . I think the training pool is poisoned, and I suspect that the large AI labs have been cooking the benchmark data for years to suspect that their models are improving more quickly than they are in reality.

That being said, I'm sure some company will figure out new strategies for deploying LLMs that will cause a significant improvement.

But I don't expect that improvements are going to come from increased training.

> [Do] you feel it is not useful in its current form for software development?

IME using LLMs for software development corrodes my intuitive understanding of an enterprise codebase.

Since the advent of LLMs, I've been asked to review many sloppy 500+/1000+ line spam PRs written by arrogant Kool-Aid drinking coworkers. If someone is convinced that Claude Code is AGI, they won't hesitate to drop a slop bomb on you.

Basically I feel that coding using LLMs degrades my understanding of what I'm working on and enables coworkers to dominate my day with spam code review requests.


> IME using LLMs for software development corrodes my intuitive understanding of an enterprise codebase.

I feel you there, I definitely notice that. I find I can output high quality software with it (if I control the design and planning, and iterate), but I lack that intuitive feel I get about how it all works in practice. Especially noticeable when debugging; I have fewer "Oh! I bet I know what is going on!" eureka moments.


This is a bot.

I don’t understand how you can conclude that LLMs are a dead end: I’ve already seen so much useful software generated by LLMs, there’s no denying that they are a useful tool. They may not replace seniors developers, and they have their limitations, but it’s quite amazing what they already do achieve.

Have you seen all the dogshit software generated by LLMs?

I also have seen lots of dogshit software created by Humans. And i have created useful software with LLMs. If you know how to manage the LLM, it can be very useful.

I notice and think about the astroturfing from time to time.

It seems so gross.

But I guess with all of the trillions of investor dollars being dumped into the businesses, it would be irresponsible to not run guerrilla PR campaigns


> FWIW I think LLMs are a dead end for software development, and that the people who think otherwise are exceptionally gullible.

I think this takes away from the main thrust of your argument which is the marketing campaign and to me makes you seem conspiratorial minded. LLMs can be both useful and also mass astroturfing can be happening.

Personally I have witnessed non coders (people who can code a little but have not done any professional software building) like my spouse do some pretty amazing things. So I don’t think it’s useless.

It can be all of:

1. It’s useful for coding

2. There’s mass social media astroturfing happening

3. There’s a massive social overhype train that should be viewed skeptically

4. Theres some genuine word of mouth and developer demand to try the latest models out of curiosity, with some driven by the hype train and irrational exuberance and some by fear for their livelihoods.


I'm not trying to be rhetorically effective, I'm stating my true belief

IN MY GENUINELY HELD OPINION, LLMs generate shit code and the people who disagree don't know what good code looks like.


LLMs are super efficient at generating boilerplate for lots of APIs, which is a time consuming and tedious part of programming.

> LLMs are super efficient at generating boilerplate for lots of APIs

Yes they are. This is true.

> which is a time consuming and tedious part of programming.

In my experience, this is a tedious part of programming which I do not spend very much time on.

In my experience LLM generated API boilerplate is acceptable, yet still sloppier than anything I would write by hand.

In my experience LLMs are quite bad at generating essentially every other type of code, especially if you are not generating JS/TS or HTML/CSS.


> They are aggressively manipulating social media with astroturfed accounts, in particular this site and Reddit.

I absolutely love reading thoughts and see the commands it uses. It teaches me new stuff, and I think this is what young people need: be able to know WHAT it is doing and WHY it is doing it. And have the ability to discuss with another agent about what the agent and me are trying to archive, and we can ask them questions we have without disturbing the flow, but seeing the live output.

Regarding the thoughts: it also allows me to detect problematic paths it takes, like when it can't find a file.

For example today I was working on a project that depends on another project, managed by another agent. While refactoring my code it noticed that it needs to see what this command is which it is invoking, so it even went so far as to search through vs code's user data to find the recent files history if it can find out more about that command... I stopped it and told it that if it has problems, it should tell me. It explained it can't find that file, i gave it the paths and tokens were saved. Note that in that session I was manually approving all commands, but then rejected the one in the data dir.

Why dumb it down?


> While refactoring my code it noticed that it needs to see what this command is which it is invoking, so it even went so far as to search through vs code's user data to find the recent files history if it can find out more about that command... I stopped it and told it that if it has problems, it should tell me.

TIL that there's an especially apt xkcd comic for this scenario: "Zealous Autoconfig"

https://xkcd.com/416/


It's pretty interesting to watch AI companies start to squeeze their users as the constraints (financial, technical, capacity-wise) start to squeeze the companies.

Ads in ChatGPT. Removing features from Claude Code. I think we're just beginning to face the music. It's also funny that how Google "invented" ad injection in replies with real-time auction capabilities, yet OpenAI would be the first implementer of it. It's similar to how transformers played out.

For me, that's another "popcorn time". I don't use any of these to any capacity, except Gemini, which I seldom use to ask stuff when deep diving in web doesn't give any meaningful results. The last question I asked managed to return only one (but interestingly correct) reference, which I followed and continued my research from there.


Yeah just adding to this -- being able to see the files being operated on is absolutely essential for figuring out if you need to ^C it and try again or if you need to let it keep going.

One good solution should be mentioned here - run Claude under strace/ltrace/LD_PRELOAD/etc.

The fact that LLM miss to read files is crucial for solving tasks. It does not matter that LLM later say "Yeah, I've fully read the specification and here is your code" if you check the log and it says: "Reading SPEC.md lines 1-400" <end_of_read>.

Overall, the complete log of interaction with the system should always be available, otherwise it is effectively a malware. That's not an exaggeration: consider that at any point of time any side part can spit out a prompt injection. Consider the use case: previously in xz-utils it was needed to sabotage the landlock kernel level sandbox, AND to exist in the memory of sshd, AND to be able to hijacking the RSA_public_decrypt. Now the only thing is needed - printf.


The meta issue here with AI companies is that they while they excel at producing LLMs, they don't have any inherent advantages when eating their own dog food and it shows in the quality and UX of their products.

Both Anthropic and OpenAI have been maintaining a high pace of releasing often poorly thought through new products and experimenting with features. A lot of their product releases show all the hallmarks of vibe coding: randomly breaking features, poor QA and testing on releases, etc.

OpenAI seems to have the upper hand in UX currently. Their products feel a bit more polished and they've clearly tried to up their game. Taking over Jony Ive's company a few months ago is a clear signal that they want to do better. The Codex AI desktop app was a clear step up from their web app and cli. I've been using both before that was released.

Both companies are spread very thin trying to do both end user and developer oriented products and features while keeping existing paying users happy as well. Both companies also have had a string of rushed product releases that kind of fizzled out: OpenAI's Atlas, which was a response to Anthropic's Comet. Neither of which seem to be very popular at this point. Several false starts with apps (OpenAI), Claude Cowork, etc. There are a lot of half formed product ideas there that than don't get the attention they deserve.

And it's not like MS, Google, and Apple are any better. If anything they are more hesitant and out of their depth here. They are all dancing around the hard issues here which are UX and security/trust models. Also, while coders get a lot of toys, nailing agentic tools for business users is proving to be a lot harder. Blanket access to everything via an agentic browser is not a viable solution. I can agentically code a structured document via latex or markdown. But the same tools are relatively useless in spreadsheets, presentations, and documents. And while you can do a lot of potentially interesting things if you surrender your inbox, the security failure modes around that remain a show stopping obstacle for wide adoption.

There's a lot of stage fright, hesitation, and immature product management in this sector. There's a bit of gold rush in terms of rapid experimentation. But as the stakes get higher, a lot of these companies are increasingly lacking the freedom to move as fast as needed. Fear of liability issues is preventing them to do a lot. Which is why most progress is concentrated around developer tools.


They don’t seem to realize that doing vibe coding requires enough information to get the vibes.

There are no vibes in “I am looking at files and searching for things” so I have zero weight to assign to your decision quality up until the point where it tells me the evals passed at 100%.

Your agent is not good enough. I trust it like I trust a toddler not to fall into a swimming pool. It’s not trying to, but enough time around the pool and it is going to happen, so I am watching the whole time, and I might even let it fall in if I think it can get itself out.


the definition of vibe coding is that you never check what it's doing, you only check its output; eg the actual website/feature you're having it build.

I could accept that definition, but really that seems less like vibe coding and more like just not coding. Vibe coding for me is more like: yeah this doesn’t look insane and you got the interface from the right git repo. Like I wouldn’t put my name on it at work and defend it in a code review but I also didn’t just push the button and hope.

This isn't just a UI preference issue, it's the observability problem that every agentic system hits eventually.

When you're building agents that interact with real environments (browsers, codebases, APIs), the single hardest thing to get right isn't the model's reasoning. It's giving the operator enough visibility into what the agent is actually doing without drowning them in noise. There's a narrow band between "Read 3 files" (useless) and a full thinking trace dump (unusable), and finding it requires treating observability as a first-class design problem, not a verbosity slider.

The frustrating part is that Anthropic clearly understands this in other contexts. Their own research on agent safety talks extensively about the need for human oversight of autonomous actions. But the moment it's their own product, the instinct is to simplify away the exact information that makes oversight possible.

The people pinning to 2.1.19 aren't being difficult. They're telling you that when an agent touches my codebase, I need to know which files it read and what it searched for — not because I want to micromanage, but because that's literally the minimum viable audit trail. Take that away and you're asking users to trust a black box that edits production code.


For a general tool that has such a broad user base, the output should be configurable. There's no way a single config, even with verbose mode, will satisfy everyone.

Set minimal defaults to keep output clean, but let users pick and choose items to output across several levels of verbosity, similar to tcpdump, Ansible, etc. (-v to -vvvvv).

I know businesses are obsessed with providing Apple-like "experiences", where the product is so refined there's just "the one way" to magically do things, but that's not going to work for a coding agent. It needs to be a unix-like experience, where the app can be customized to fit your bespoke workflow, and opening the man page does critical damage unless you're a wizard.

LLMs are already a magic box, which upsets many people. It'll be a shame if Anthropic alienates their core fan base of SWEs by making things more magical.


Meanwhile GPT-5.3-Codex which just released recently is a huge change and much better. It now displays intermediate thinking summaries instead of being silent.

My experience using it from cursor has been fairly disappointing

Much better in the codex cli harness

There's one really confusing thing in Codex CLI from my perspective. How do I make it run unsandboxed but still ask me for approvals? I'm fine with it running bare on my machine but I like to approve first before it runs commands. But I only see how I can configure to have both or none. What am I missing?

Interesting, I can give that a try at some point.

In what way(s), if you can elaborate?

Claude 4.5 or 4.6 just one shots what I ask instead of getting stuck in random tangents.

Sounds like the compacting issue.

> Compacting fails when the thread is very large

> We fixed it.

> No you did not

> Yes now it auto compacts all messages.

> Ok but we don't want compaction when the thread isn't large, plus, it still fails when the compacted thread is too large

> ...


Let me fix that for you:

> Compacting fails when the thread is very large

Flips coin, it is Heads

> We fixed it.

> No you did not

Flips coin, it is Tails

> Yes now it auto compacts all messages.

Flips coin, it is Heads

> Ok but we don't want compaction when the thread isn't large, plus, it still fails when the compacted thread is too large

Flips coin, it is Grapefruit

> ...

Congratulations on a vibe solution, if you are unhappy with the frequency of isomorphic plagiarism... the vendor still has your money and new data =3


I also found this change annoying.

Often a codebase ends up with non-authoritative references for things (e.g. docs out of sync with implementation, prototype vs "real" version), and the proper solution is to fix and/or document that divergence. But let's face it, that doesn't always happen. When the AI reads from the wrong source it only makes things worse, and when you can't see what it's reading it's harder to even notice that it's going off track.


Absolutely worse than dumbed down, 4.6 is a mess. Ask it the simplest of questions, look away, and come back to 700 parallel tool uses. https://old.reddit.com/r/ClaudeAI/comments/1r1cfha/is_anyone...

You open your toolbox to get your pliers, but due to nanite software updates, your pliers are now a chisel.

We carefully considered this change and feel it brings the most value to our users, and we hope you'll love chisel as much as we do.

One day, you guys are gonna learn not to tie your livelihoods to the whim of a corporation, but today isn't that day.


This was really useful; sometimes, by a glance, you'd see Claude looking at the wrong files or searching the wrong patterns, and would be able to immediately interrupt it. For those of us who like to be deeply involved in what Claude is doing, those updates were terribly disappointing.

I must use AI differently than y'all. Do we not use plan mode?

There is almost no value in watching the stream of intermediate tokens. There's no need to micromanage the agent's steps. Just monitor the artifact and insist the LLM summarizes findings in plain English.

If it can't explain the proposed change coherently, it can't code it coherently either. `git restore .`

I find it much more effective to throw away bad sessions, try a new prompt than to massage the existing context swamp.


Working at Microsoft, I've just now hooked up to Claude Code (my department was not permitted to use it previously), through something called "Agent Maestro", a vscode extension which I guess pipes claude code API requets to our internally hosted Claude models, including Opus 4.6.

I do wonder if there is going to be much of a difference between using Claude Code vs. Copilot CLI when using the same models.


> I do wonder if there is going to be much of a difference between using Claude Code vs. Copilot CLI when using the same models.

I’m also at MS, not (yet?) using Claude Code at work and pondering precisely the same question.


Is this an indictment of OpenAI's models -- that Microsoft has access to through their investment?

We've had both GPT and Claude models available to us in Github Copilot for some time. At first, it was only GPT models.

I honestly don’t think the models are as important as people tend to believe. More important is how the models are given tools - find, grep, git, test runners, …

> I honestly don’t think the models are as important as people tend to believe.

I tend to disagree. While I don't see meaningful _reasoning power_ between frontier models, I do see differences in the way they interact with my prompts.

I use exclusively Anthropic models because my interactions with GPT are annoying:

- Sonnet/Opus behave like a mix of a diligent intern, or a peer. It does the work, doesn't talk too much, gives answers, etc.

- GPT is overly chatty, it borderline calls me "bro", tend to brush issues I raise "it should be good enough for general use", etc.

- I find that GPT hardly ever steps back when diagnosing issues. It picks a possible cause, and enters a rabbit hole of increasingly hacky / spurious solutions. Opus/Sonnet is often to step back when the complexity increases too much, and dig an alternative.

- I find Opus/Sonnet to be "lazy" recently. Instead of systematically doing an accurate search before answering, it tries to "guess", and I have to spot it and directly tell it to "search for the precise specification and do not guess". Often it would tell me "you should do this and that", and I have to tell it "no, you do it". I wonder if it was done to reduce the number of web searches or compute that it uses unless the user explicitly asks.


Compare their system prompts and the agent harness logic. It's 99% of what makes the agent useful, and it can be quite different.

Vibe-coders griping about Claude's vibe-coded CLI hits all the right vibes.

Literally the opposite though, as being able to see what it reads allows you to tell it to ignore certain files when you see it read the wrong one, and adjust the claude.md file to ensure that it does not read incorrect files given a specific input.

True vibe coders don't care about this.


Jokes about vibe-coded CLI aside, I think that's the issue for me, the defaults are being tailored to vibe coders. (and the general weirdness of trying to fix it with verbose mode)

I like that people who were afraid of CLIs perhaps are now warming up to them through tools like Claude Code but I don't think it means the interfaces should be simplified and dumbed down for them as the primary audience.

Sure you can press CTRL+O, but that's not realtime and you have to toggle between that and your current real time activity. Plus it's often laggy as hell.


Yeah, these all sound like complete non issues if you're actually... keeping your codebase clean and talking through design with Claude instead of just having it go wild.

I'm using it for converting all of the userspace bcachefs code to Rust right now, and it's going incredibly smoothly. The trick is just to think of it like a junior engineer - a smart, fast junior engineer, but lacking in experience and big picture thinking.

But if you were vibe coding and YOLOing before Claude, all those bad habits are catching up with you suuuuuuuuuuuper hard right now :)


I hate to say it, but "vibe-coders" are just "coders" now.

It's a huge shift, but we need to start thinking of AI-tools as developer tools, just like a formatter, linter, or IDE would be.

The right move is diversity. Just like diversity of editors/IDEs. We need good open source claude code alternatives.


They aren't, though.

As a SE with over 15 years' professional experience, I find myself pointing out dumb mistakes to even the best frontier models in my coding agents, to refine the ouput. A "coder" who is not doing this on the regular is only a tool of their tool.

(in my mental model, a "vibe coder" does not do this, or at least does not do it regularly)


Well, the term lacks clarity and a shift of meaning.

If you define "vibe-coders" as people who just write prompts and don't look at code - no, they ain't coders now.

But if you mean people who do LLM-assistet coding, but still read code (like all of those who are upset by this change) - then sure, they always have been coders.


This shows one problem here: a private entity controls Claude Code. You can reason that it brings benefits (perhaps), but to me it feels wrong to allow my thinking or writing code be controlled by a private entity. Perhaps I have been using Linux for too long - I may turn into RMS 2.0 (not really though, I like BSD/MIT licences too).

Strong meme game. I'm on an older release and now I'm reluctant to update. In my current release, the verbosity is just where I want it and control-o is there when I really need it.

I agree the quality of Claude Code recent has felt poor and frustrating.

I’ve been persistently dealing with the agent running in circles on itself when trying to fix bugs, not following directions fully and choosing to only accomplish partial requests, failing to compact and halting a session, and ignoring its MCP tooling and doing stupid things like writing cruddy python and osascripts unnecessarily.

I’ve been really curious about codex recently, but I’m so deep into Claude Code with multiple skills, agents, MCPs, and a skill router though.

Can anyone recommend an easy migration path to codex as a first time codex user from Claude code?


LOL, no, dumbing down was when I paid two months of subscription with the model literally struggling to write basic functions. Something Anthropic eventually acknowledged but offered no refunds for. https://ilikekillnerds.com/2025/09/09/anthropic-finally-admi...

I care A LOT about the details, and I couldn't care less that they're cleaning up terminal output like this.


I really dislike this trend that unfortunately has become, well, a trend. And has followers. Namely, let's simplify to "reduce noise" and "not overwhelm users", because "the majority of users don't need…".

This is spreading like a plague: browser address bars are being trimmed down to nothing. Good luck figuring out which protocol you're using, or soon which website you are talking to. The TLS/SSL padlock is gone, so is the way to look into the site certificate (good luck doing that on recent Safari versions). Because users might be confused.

Well the users are not as dumb as you condescendingly make them out to be.

And if you really want to hide information, make it a config setting. Ask users if they want "dumbo mode" and see if they really do.


The TLS thing at least kind of makes sense. 99.9% of sites that the typical user visits will have a correctly configured and trusted certificate and communicate over TLS, so the browsers only show an indicator when that’s not the case. I think it’s a sensible evolution given how the internet has changed.

https://github.com/anthropics/claude-code/issues/24537

Seems like a dashboard mode toggle to run in a dedicated terminal would be a good candidate to move some of this complexity Anthropic seems to think “most” users can’t handle. When your product is increasing cognitive load the answer isn’t always to remove the complexity entirely. That decision in this case was clearly the wrong one.


I've been using Claude Code heavily for the past few weeks on a production project. Opus 4.6 is noticeably more capable than what I was using before, longer autonomous runs, better contextual awareness across files, fewer hallucinated edits. The UX changes I'm less sure about. The progressive disclosure thing makes sense in theory but sometimes I want to see exactly what it's doing without clicking through. The terminal is where I work, don't hide things from me.

Boris's response here is the right move though. Acknowledging the miss and committing to a fix in the next release is how you build trust with a dev audience.


My last experience with Claude support was a fun merry go round.

I had used a Visa card to buy monthly Pro subscription. One day I ran out of credits so I go to buy extra credit. But my card is declined. I recheck my card limit and try again. Still declined.

To test the card I try extending the Pro subscription. It works. That's when I notice that my card has a security feature called "Secure by Visa". To complete transaction I need to submit OTP on a Visa page. I am redirected to this page while buying Pro subscription but not when trying to buy extra usage.

I open a ticket and mention all the details to Claude support. Even though I give them the full run down of the issue, they say "We have no way of knowing why your card was declined. You have to check with your bank".

Later I get hold of a Mastercard with similar OTP protection. It is called Mastercard Securecode. The OTP triggers on both subscription and extra usage page.

I share this finding with support as well. But the response is same - "We checked with our engineering team and we have no way of knowing why the other Visa card was declined. You have to check with your bank".

I just gave up trying to buy extra usage. So, I am not really surprised if they keep making the product worse.


I guarantee you talked to a chat bot. There are no human support agents anywhere anymore.

I did talk to human support after going through multiple rounds of "check with your bank" with the chatbot. The response was slow, taking over 24hrs between each response.

Its true. They have no idea why your bank was declining the charge, only that it was declined.

I don't get why people cling to the Claude Code abusive relationship. It's got so many issues, it's getting worse, and it's clear that there's no plan to make it open for patching.

Meanwhile OpenCode is right there. (despite Anthropic efforts, you can still use it with a subscription) And you can tweak it any way you want...


Hey... I have been experimenting with Claude for a few days, and am not thrilled with it compared to web chatbots. I suspect this is partly me being new and unskilled with it, but this is a general summary.

ChatGPT or Gemini: I ask it what I wish to do, and show it the relevant code. It gives me a often-correct answer, and I paste it into my program.

Claude: I do the same, and it spends a lot of time thinking. When I check the window for the result, it's stalled with a question... asking to access a project or file that has nothing to do with the problem, and I didn't ask it to look for. Repeat several times until it solves the problem, or I give up with the questions.


If you're not vibecoding your own UX to render CC's output the way you like it, you're not living.

If you're not vibecoding your own UX to render CC's output the way you like it, you're getting replaced by AI.

If you're not replacing the replacers, you're the replaced.

This is why I joined The Watchmen.

Serous question - why do people stick with Clause Code over Cursor? With Cursors base subscription I have access to pretty much all the Frontier models and can pick and choose. Anthropic models haven’t been my go-to in months, Gemini and Codex produce much better results for me.

Cursor performs notably worse for me on my medium-sized codebase (~500kloc), possibly because they try to aggressively conserve context. This is especially true for debugging, Claude Code will read dozens of files and do a surprisingly good job of finding complex bugs, while Cursor seems to just respond with the first hypothesis it comes up with.

That said, Cursor Composer is a lot faster and really nice for some tasks that don't require lots of context.


My answer is that I tested both, and Claude Code (~8 months ago) was so obviously better than Cursor that I continue to happily pay Anthropic $200/month. Based on anecdotes I happen to catch, I don't believe Cursor's caught up.

The value isn't just the models. Claude Code is notably better than (for example) OpenCode, even when using the same models. The plug-in system is also excellent, allowing me to build things like https://charleswiltgen.github.io/Axiom/ that everyone can benefit from.


Because when it's good, it's really good - Cursor doesn't work as well for me and also I prefer the TUI experience. If anything, the real alternative is OpenCode.

Part of the sauce is not in the model, but in the agent itself. And for that matter, I think AMP an incredibly better agent that Claude Code. But then, Claude heavily subsidized subscription prices are hard to beat.

Wouldn't you run out of tokens sooner? That's the big problem.

Because I tried all the Cs - Copilot, Cursor, Codex, and Claude - and Claude consistently have better results. Codex was faster, Copilot had better integration, Cursor sometimes seemed smarter, but Claude was the best most reliable consistent experience overall, so Claude is what I stuck with - and so did the rest of our eng department.

Perhaps some power user of Claude Code can enlighten me here, but why not just using OpenCode? I admit I've only briefly tried Claude Code, so perhaps there are unique features there stopping the switch, or some other form of lock-in.

Anthropic is actively blocking calls from anything but claude code for it's claude plans. At this point you either need to be taking part in the cat and mouse game to make that plan work with opencode or you need to be paying the much more expensive API prices.

i see

i guess they were blocking OpenCode for a reason

this will put people to the test that use mainly Anthropic, to have a second look at the results from other models


We're having a UI argument about a workflow problem.

We treat a stateless session like a colleague, then get upset when it forgets our preferences. Anthropic simplified the output because power users aren't the growth vector. This shouldn't surprise anyone.

The fix isn't verbose mode. It's a markdown file the model reads on startup — which files matter, which patterns to follow, what "good" looks like. The model becomes as opinionated as your instructions. The UI becomes irrelevant.

The model is a runtime. Your workflow is the program. Arguing about log verbosity is a distraction.


I'm not sure this is a regression, at least how I use it - you can hit control + o to expand, and usually the commands it runs show the file path(s) it's using, and I'm really paranoid with it, and I didn't even notice this change.

i've never had to use control + o before but with the latest changes, i give Opus a simple task that should take a few seconds and it's like "used 15k tokens" and "thinking" for three minutes with absolutely zero indication or visibility as to what it's actually doing and i have to ESC ESC it to stop and ask what the FUCK are you actually doing claude?

Yes, I’ve been evaluating since the start of the year and since 4.6 suddenly the most innocuous requests will sit there “thinking” for 5+ minutes and if I can get it to show me the thinking it’s just going round in circles.

Or, it decided it needs to get API documentation out and spends tens of thousands of tokens fetching every file in a repo with separate tool use instead of reading the documentation.

Profitable, if you are charging for token usage, I suspect.

But I’m reaching the point where I can’t recommend claude to people who are interesting in skeptically trying it out, because of the default model.


I guess I engineered around this before 4.6 - I did notice a regression in it wanting to search deeper than I wanted and had specified, but just restricted it with tooling I wrote that would enforce what I wanted. In that respect, I feel comfortable running 4.6 with the guardrails I already have, but did notice some squirrelyness I didnt anticipate in my utility scripts.

It is clever. its its best and worst feature.


Yeah after my switch to Opus 4.6 I noticed a lot of this. I've been wary that eventually models are going to optimize for token usage increases, since that's how the company makes money. I told it to read the files in my directory (4 files, longest was like 380 lines) and caught it using 14 tool uses- including head -n 20 and tail -n 20 on a file. Definitely a what are you doing moment.

OTOH I find it pretty funny that the instant they manage to make a model that breaks general containment of popularity and usefulness (4.5), the toxicity of the business model kicks in and they instantly enshittify.

I think this change is really disingenuous.

If they hide how the tool is accessing files (aka using tokens) and then charging us per token - how are we able to track loosely what our spend is?

I’m all for simplification of the UX. But when it’s helping to hide the main spend it feels shitty.


i think yesterday it ate the whole context window in one thinking call.

i bet in a week itll eat the whole 5hour throttle in one call too:P


My biggest beef in recent versions is the automatic use of generic built in skills. I hate it when I ask a simple question and it says "OK! Time to use the RESEARCHING_CRAZY_PROBLEM skill! I'll kickstart the 20 step process!" when before it would just answer the question.

You can control this behavior, so it's not a dealbreaker. But it shows a sort of optimism that skills make everything better. My experience is that skills are only useful for specific workflows, not as a way to broadly or generally enhance the LLM.


> “Read 3 files.” Which files?

> “Searched for 1 pattern.”

Hit Ctrl-o like it mentions right there, and Claude Code will show you. Or RTFM and adjust Output Styles[1]. If you don't like these things, you can change them.

Like it or not, agentic coding is going mainstream and so they are going to tailor the default settings toward that wider mainstream audience.

1. https://code.claude.com/docs/en/output-styles


Could write a similar post about their cloud UI. Feel like you have so little control over the thing. I think Cursor has an uphill battle by having to go through the API, but they certainly do a better job of making conversations and context more transparent and manageable.

It's clear we're seeing the same code-vs-craft divergence play out as before, just at a different granularity.

Codex/Claude would like you to ignore both the code AND the process of creating the code.


If you've got a solution to the problem of bad decisions made by people who shouldn't be empowered to make them in the first place, you'll solve more than Claude Code.

I unfortunately have unsubbed from my 200 plan after having it for months, It really really seems to me that you never 100% feel like you're getting 4.6 and the same was happening with 4.5, some sessions it truly felt like haiku was being used despite the default setting and high thinking.

I like claude models, but crush and opencode are miles ahead of claude code. It's a pity anthropic forces us to use inferior tooling (I'm on a "team" plan from work). I can use an API key instead but then I'll blow past 25$ in an hour.

Many people are complaining, and that is indeed a meaningful step toward improvement. However, I just want to say 'Thank you' here.

And they hate that people are using different agents (like opencode) with their subscription - to the extent that they have actively been trying to block it.

With stupidity like this what do they expect? It’s only a matter of time before people jump ship entirely.


I have noticed, if I hit my session quota before it resets, that Claude gets "sleepy" for a day or so afterward. It's demonstrably worse at tasks...especially complex ones. My cofounder and I have both noticed this.

Our theory is that Claude gets limited if you meet some threshold of power usage.


It makes sense that any product written after the advent of these AI code generators, including the AI code generators themselves, will get worse as it starts to eat itself.

I liked the way it was. Its a companion developer not an autonomous one. Let it speak developer to me or give it a know. Run from naive to hacker

Claude got smarter so we see less. Same playbook every SaaS uses when power users become edge cases. File paths aren't noise. They're the only thing stopping your LLM from hallucinating your codebase into garbage.

Anthropic is optimizing for enterprise contracts, not hacker cred. This is what happens when you take VC money and need to sell to Fortune 500s. The "dumbing down" is just the product maturing beyond the early adopter phase.

The histrionic tone is annoying but this is actually a feature failure. The utility of seeing what files were being read is I could help direct its use if it goes down the wrong pathway. I use a monorepo so that's an easy mistake for the software to make.

We opensourced our claude code ui today: https://github.com/bearlyai/openade

I wanted a terminal feel (dense/sharp) + being able to comment directly on plans and outputs. It's MIT, no cloud, all local, etc.

It includes all the details for function runs and some other nice to haves, fully built on claude code.

Particularly we found planning + commenting up front reduces a lot of slop. Opus 4.6 class models are really good at executing an existing plan down to a T. So quality becomes a function of how much you invest in the plan.


Built similar focused specifically on planning annotations.

https://github.com/backnotprop/plannotator

It integrates with the CLI through hooks. completely local.


That looks great! Planning phase is really key.

So much for human replacement.

Map it to a workplace:

- Hey Joe, why did you stop adding code diff to your review requests?

- Most reviewers find it simpler. You can always run tcpdump on our shared drive to see what exactly was changed.

- I'm the only one reviewing your code in this company...


Everyone, file your own ticket (check the box saying you searched for existing tickets anyway)!

After the Anthropic PMs have to delete their hundredth ticket about this issue, they will feel the need to fix it ... if only to stop the ticket deluge!


This is a horrible change! I agree with everything in the article

Another instance of devs being out of touch is them wanting Claude Code to respect AGENT.md: https://github.com/anthropics/claude-code/issues/6235

What’s wrong with you, people? Are you stupid?


I've never used Claude or anything like it so this may be a dumb question: could you solve this problem by having a CLAUDE.md file that simply says to use AGENT.md if one is available. Can an AI agent not do that?

Yes, the most common solution for this problem either creating a symbolic CLAUDE.md link pointing to AGENT.md (or visa versa) if OS supports it.

Or, in CLAUDE.md have an instruction to follow AGENT.md - but this approach is quite unreliable.

These are solutions to a problem that shouldn’t exist in the first place. How else can one explain Anthropic’s reluctance to adhere to a widely adopted standard, if not as an attempt to build a walled garden around an otherwise great product?


It's not a dumb question per se but it does fail to understand the issue. It's that there's 20 coding agents yet only 1 of them needs this solving. Imagine if all of them needed this. It's like IE6, or Lightning connectors. At least for that last one there's an argument that they performed better than USB-C. For the Anthropic people reading this - take note that both IE and Lightning are now dead and their competitors that followed the standards are thriving.

I understand the issue is that Anthropic is not adhering to a standard. I was simply asking whether it's possible to solve the problem created by Anthropic in the way I was asking.

The better way to solve it is a symlink. The way you're suggesting works too, but should be done using an @ reference, which is auto-followed by Claude. This is the most common way on Windows.

Like any CLI Claude Code should follow decades old tradition of providing configurable verbosity levels, like tcpdump's -v to -vvvvv to accommodate varying usage contexts.

It was because of the (back then) new Haiku model, maybe 3.5, that i decided to subscribe yearly. more than good enough for a language layer to interact with the mcp server. Now I'm even hesitant to use it.

It's nerfed to a point that it feels more like lawyer than a coding assistant now. We were arguing about an 3rd party API ToU for 1 hour last night. VSC Copilot executed it within 1 minute.

What a weird hill to die on

And also a complete PR fail. This is damaging their brand with devs for no meaningful benefit.

I didn't even see it was a brand blog. Sheesh

can't you write some tool to display the files being read with the inotify system call?

Usually I hate programming but it feels like a nice little tool to create


If you've not, I recommend giving Opus[1m] + teams a shot, warning it's hella expensive but holy cow... what a tool.

Since last Friday it’s felt like CC rolled back a year of progress. Not sure what to attribute it to, or what this article seems to be about but it _felt_ much dumber.

claude code is big enough now that it really needs a preview / beta release channel where features like this can be tested against a smaller audience before being pushed out.

as a regular and long-term user, it's frequently jarring being pushed new changes / bugs in what has become a critical tool.

surprised their enterprise clients haven't raised this


What if it’s used with a different harness, e.g. Opencode?

You infamously cannot use Claude Code with a different harness anymore (without shenanigans that will likely draw Anthropic's ire).

What happens when you press ctrl+o? You get verbose mode?

You can only ctrl+o the most recent response, and its a lot worse than knowing the # of lines read or the pattern grepped, which are useful because it can tell you what the agent is thrashing on trying to find, or what context would be useful to give it upfront in the future.

I just tested, it shows you which files it read, same as first example he gave "Where you used to see."

Yeah just that it's not real time and you have to toggle to see it. It lags a bunch also in longer threads. Definitely a downgrade.

I mean yes, they claim that it's "Claude Code Native" or something but it does feel laggy and takes multiple seconds to start. What do they even mean native, didn't they acquire Bun? It's not native. They need to rewrite it in Rust, I'm serious.

Codex feels much faster. For a while after the rewrite (to rust also I think?) it was bad because you couldn't copy anything from the terminal but since then it's gotten much much better.

I believe it opens the file that was referenced. Apologies in advance if I got that wrong.

Honestly? Half the time the shitty vibe coded Claude CLI interface spergs out. Don't try to scroll too much


As soon as there is a viable alternative to Claude Code, I'm gone after this change. It appears minor on the surface but their response to all the comments tells you everything you need to know. They don't even want to concede at all, or at least give a flag to enable the old behavior, what was deployed and working for many users before. It's a signal that someone, somewhere at Anthropic is making decisions based on ego, not user feedback.

The other fact pattern is their CLI is not open source, so we can't go in and change it ourselves. We shouldn't have to. They have also locked down OpenCode and while there are hacks available, I shouldn't have to resort to such cat and mouse games as someone who pays $200/month for a premium service.

I'm aggressively exploring other options, and it's only a matter of if -- not when, one surfaces.


Am I right that they still refuse to read AGENTS.md?

Yes as of about a week ago, last I checked.

codex cli. I switched, no regrets. Also, $20 for top model vs being limited to sonnet.

Plus (the $20 plan) is still stuck on 5.2 right now..

5.3 codex xhigh works for me

Honestly even medium is quite good.

"It appears minor on the surface but their response to all the comments tells you everything you need to know."

I mean I hope it's just a single developer being stubborn rather than guidance from management asking everyone to simplify Claude Code for maximum mass appeal. But I agree otherwise, it's telling.


>Try using it for a few days. We've been using this internally at Anthropic for about a month now, and found that it took people a few days to mentally switch over to the new UI. Once they did, it "clicked" and they appreciated the reduced noise and focus on the tools that actually do need their attention.

Ah, the old "you're holding it wrong."


Sorry I'm dumber than the average Anthropic employee, might just take me a few more days for it to "click" that I'm no longer seeing useful information and that this is good.

They’re dog-fooding it wrong. ;)

> That’s it. “Read 3 files.” Which files? Doesn’t matter.

It doesn't say "Read 3 files." though - it says "Read 3 files (ctrl+o to expand)" and you press ctrl+o and it expands the output to give you the detail.

It's a really useful feature to increase the signal to noise ratio where it's usually safe to do so.

I suspect the author simply needs to enable verbose mode output.


This is directly addressed in the article.

Give me my local models so I can write a locally handcrafted tool that does what I want, goddamit.

$200 a month? I buy compute credits as needed and have used maybe $300 in a year

Can we not like, just apply a patch? Or will anthropic be mad if I run their client with my own patch?

Nix makes it easy to package up esotheric patches reliably and reproducibly, claude lowers the cost of creating such patches, the only roadblocks Inforesee are legal.



Claude code is distributed as a minified JS bundle so you cant just easily patch in this functionality

I’m told that this new LLM tech is great at deminimizing minified javascript, no?

It feels like CC got nerfed after the 4.6 drop.

The dumbing down of LLMs seems to be working in my favor so far as my platform provides guardrails for LLMs by abstracting away complexity.

They could potentially dumb it down further, but if they did that, it would hurt other use cases and competitors much more.


RooCode is a better version of ClaudeCode than ClaudeCode.

No affiliation, just a fan.


Hilarious! Anthropic can just vibe code the boolean flag in.

I think they already do? Which is commendable tbh. But I keep my popcorn ready and warm for the day when their vibe coding can't keep up with the codebase. Of course they will try their best to hide that fact for as long as possible.

I find it hard to care about claims of degradation of quality, since this has been a firehouse of claims that don't map onto anything real and is extremely subjective. I myself made the claim in error. I think this is just as ripe for psychological analysis as anything else.

Did you read the article? It's not about subjective claims, it's about a very real feature getting removed (file reads showing the filepath and numbers of lines read).

This is exactly what I am talking about. Let me try to explain.

I am interested in the more abstract and general concept of: "People excessively feel that things are worse, even if they are not." And I see this A LOT in the AI/LLM area.

For instance, the claim that Claude Code, on the UX/DX side, is dumbed down seems to me absolutely not a reasonable take. The "hiding" of the file name being read is no longer being shown neither supports that claim, AND has to be seen in the context of Claude Code as a whole.

On the first point: Could one not make the argument that "not showing files read", is part of a more advanced abstraction layer, switching emphasis to something else in the UX experience? That could, by some, be seen as the overall package becoming more advanced and making choices as to what is presented for cognitive load. Secondly... it's not removed. It's just not default shown in non-verbose mode. As I understand it, you can just hit CTRL+O to see it again.

Secondly, even if it was done ONLY to be less for "power user focus," and more for dumb people (got to love the humility in the developer world), it's blindly obvious that you can't just mention ONE change as proof that Claude Code is dumbed down. And to me, it just does not compute to say that Claude Code feels dumbed down over the last patches. The amount of more advanced features, like seeing background tasks, the "option" selection feature, lifecycle hooks, sub-agents, agent swarms, skills—all of these have been released in just the last few months. I have used Claude Code since the very beginning, and it is just insane to claim that it's getting dumber as a tool. And this is just in relationship to the actual functionality, UX, and DX, not the LLM quality. But people see "I now have to hit CTRL+O to see files being read = DUMBED DOWN ENSHITFICATION!!!" I don't get it.

My point was simply... I'm much more interested in the psychological aspects driving everybody to predictably always claim that "things are getting worse," when it seems to not be the case. Be that in the exaggerated (but sometimes true) claims of model degradation, or as in this example of Claude Code getting dumbed down. What is driving this bias towards seeing and claiming things are getting worse, out of proportion to reality?

Or even shorter: why are we obsessed with the narrative of decline?


You seem to be referring to something else than the topic the article is about.

No, it's a psychological effect. Claude Code pushes the reward center in your brain, and that reward center gets tired after a while.

this has got to be one of the worst comments sections i've ever seen on HN... people shouting past each other... into the void...

WHAT??

This is why I am a big fan of self-hosting, owning your data and using your own Agent. pi is a really good example. You can have your own tooling and can switch any SOTA model in a single interface. Very nice!

https://lucumr.pocoo.org/2026/1/31/pi/


Goodbye Claude Code. Welcome Pi.

Gemini CLI shows all the file paths.

I don't feel as if any CLI editor has quite nailed UX yet

If you are talking about agents I feel like opencode has gotten pretty good UI/UX

If you are talking about a CLI editor, then micro has hit the nail on quality UX

https://micro-editor.github.io/


The UX where it completely breaks copy paste conventions on Linux? Other than that I agree it's gotten pretty good but this one thing drives me mad each time I use it.

I think that you can actually change the keybindings to follow the copy paste conventions that you want in micro

But personally I really love these new copy paste conventions, its the ctrl q convention which troubled me in ghostty but what I did was "ctrl > " write quit enter

https://github.com/micro-editor/micro/blob/master/runtime/he...


Gemini CLI shows the file paths

I have been using it extensively, and for me it's fine as it is. Also, the title is just false. How did this get into HN frontpage, that's a good question.

I thought this was going to talk about a nerfed Opus 4.6 experience. I believe I experienced one of those yesterday. I usually have multiple active claude code sessions, using Opus 4.6, running. The other sessions were great, but one session really felt off. It just felt much more dumbed down than what I was used to. I accidentally gave that session a "good" feedback, which my inner conspiracy theorist immediately jumps to a conclusion that I just helped validate a hamstrung model in some A/B test.

> Read 3 fies (ctrl+o to expand)

What if you hit ctrl+o?


exactly what i think when reading the top of the article, maybe the author turned off vebose mode

The verbose mode is, well, verbose. They removed, without any need, info and hid it in a wall of text.

can't stand not seeing what exactly an ai agent is doing on my machine

another case of 'devs are out of touch with users basics needs and basic day-to-day usage of our app'

I think it's a case of wishful design. When they (or rather their own vibecoding tools) imagine how the tool is used, they aren't imagining that it's actually a human-machine interface, with the human actively engaged in the loop. Instead, the human is mostly expected to behave as a magical prompt oracle with a credit card and let the machine take care of the details.

by devs you mean those two guys on twitter who brag about vibe coding with 100 agents running simultaneously. While Claude Code still can't display images. I wonder what they are doing with those 100 agents

It's definitely a case of out-of-touch devs, but which cohort they are is still to be seen.

Just use pi, love it!

AI so intelligent, it enshittifies itself and your codebase for you.

My issue with CC is that its interface deliberately obscures the code from you, making you treat it more like a genie you make wishes of rather than making changes and checking the output.

I may not be up to date with the latest & greatest on how to code with AI, but I noticed that as opposed to my more human in the loop style,


Because they don't want you to improve.

not getting dumbed down, ai is getting smarter than you at a speed faster than you can keep up or understand, have to abstract things and simplify so you can stay connected.

It's kind of annoying to see headlines complaining about some consumer facing UI that sound like a fundamental change in the model.

This "intervening" people are mentioning in these issues, does it stop the execution on the backend or just cause the client to stop listening to it?

At least now we also have a tracker: https://marginlab.ai/trackers/claude-code/

Saw this the other day and loved it. Especially seeing Opus 4.5 degrading prior to the 4.6 release (IIRC) and Codex staying very stable and even improving over time.

But FYI the blog post is not about the actual model being dumbed down, but the command line interface.


Exact same thing with Codex from 5.2 to 5.3.

There's no conspiracy, though, other than more tokens consumed = more money, and they want that.


This comes up from time to time and although my experience is anecdotal, I see clear degradation of output when I run heavy loads (100s of batched/chunked requests, via an automated pipeline) and sometimes the difference in quality is absolutely laughable in how poor it is. This gets worse for me as I get closer to my (hourly, weekly) limits. I am Claude Max subscriber. There’s some shady stuff going on in the background, for sure, from my perspective and experience during my year or so of intense usage.

Man, you have to read the article, not just the headline

That would definitely be helpful, but the headline hit a painful spot for me and I went in! You’re right tho! I was in my feelins. I still am. lol

shrinkflation

"This is as bad as it's going to be" turning out to be wrong

They could change course, obviously. But how does the saying go again -- it's easier for a camel to go through the eye of a needle, than for a VC funded tech startup to not enshittify.


I've been on the other side of this as a PM, and it's tough because you can't always say what you want to, which is roughly: This product is used by a lot of users with a range of use cases. I understand this change has made it worse for you, and I'm genuinely sorry about that, but I'm making decisions with much more information than you have and many more stakeholders than just you.

> What majority? The change just shipped and the only response it got is people complaining.

I'll refer you to the old image of the airplane with red dots on it. The people who don't have a problem with it are not complaining.

> People explained, repeatedly, that they wanted one specific thing: file paths and search patterns inline. Not a firehose of debug output.

Same as above. The reality is there are lots of people whose ideal case would be lots of different things, and you're seeking out the people who feel the same as you. I'm not saying you're wrong and these people don't exist, but you have to recognize that just because hundreds or thousands or tens of thousands of people want something from a product that is used by millions does not make it the right decision to give that thing to all of the users.

> Across multiple GitHub issues opened for this, all comments are pretty much saying the same thing: give us back the file paths, or at minimum, give us a toggle.

This is a thing that people love to suggest - I want a feature but you're telling me other people don't? Fine, just add a toggle! Problem solved!

This is not a good solution! Every single toggle you add creates more product complexity. More configurations you have to QA when you deploy a new feature. Larger codebase. There are cases for a toggle, but there is also a cost for adding one. It's very frequently the right call by the PM to decline the toggle, even if it seems like such an obvious solution to the user.

> The developer’s response to that?

> I want to hear folks’ feedback on what’s missing from verbose mode to make it the right approach for your use case.

> Read that again. Thirty people say “revert the change or give us a toggle.” The answer is “let me make verbose mode work for you instead.”

Come on - you have to realize that thirty people do not in any way comprise a meaningful sample of Claude Code users. The fact that thirty people want something is not a compelling case.

I'm a little miffed by this post because I've dealt with folks like this, who expect me as a PM to have empathy for what they want yet can't even begin to considering having empathy for me or the other users of the product.

> Fucking verbose mode.

Don't do this. Don't use profanity and talk to the person on the other side of this like they're an idiot because they're not doing what you want. It's childish.

You pay $20/month or maybe $100/month or maybe even $200/month. None of those amounts entitles you to demand features. You've made your suggestion and the people at Anthropic have clearly listened but made a different decision. You don't like it? You don't have to use the product.


I know product managers in particular hate it but, especially with professional software, when you gave lots of users you have to make things configurable and live with maintaining the complexity.

The alternatives are alienating users or dumbing down the software, both of which are worse for any serious professional product.


I don't think it's fair to say that product managers hate it. There are a lot of product managers and a lot of kinds of software. I've worked on complex enterprise software and have added enormous amounts of complexity into my products when it made sense.

> The alternatives are alienating users or dumbing down the software, both of which are worse for any serious professional product.

I disagree that this is universally true. Alienating users is very frequently the right call. The alienated users never feel that way, but it's precisely the job of the PM to understand which users they want to build the product for and which ones they don't. You have to be fine alienating the latter group.


Well, they already fucked over the community with their "lol not really unlimited" rug-pull.

For those of you who are still suckered in paying for it, why do you think the company would care how they abuse the existing users? You all took it the last time.


Quite frankly, most seasoned developers should be able to write their own Claude Code. You know your own algorithm for how you deal with lines of code, so it's just a matter of converting your own logic. Becoming dependent on Claude Code is a mistake (edit: I might be too heavy handed with this statement). If your coding agent isn't doing what you want, you need to be able to redesign it.

It's not that simple. Claude Code allows you to use the Anthropic monthly subscription instead of API tokens, which for power users is massively less expensive.

Drug dealer business model. The first bag is free. Don't act surprised when you get addicted and they 10x the price.

this is the real reason why people are switching to claude code.

Yes and no. There are many not-trivial things you have to solve when using an LLM to help (or fully handle writing) code.

For example, applying diffs to files. Since the LLM uses tokenization for all its text input/output, sometimes the diffs it'll create to modify a file aren't quite right as it may slightly mess up the text which is before/after the change and/or might introduce a slight typo in text which is being removed, which may or may not cleanly apply in the edit. There's a variety of ways to deal with this but most of the agentic coding tools have this mostly solved now (I guess you could just copy their implementation?).

Also, sometimes the models will send you JSON or XML back from tool calls which isn't valid, so your tool will need to handle that.

These fun implementation details don't happen that often in a coding session, but they happen often enough that you'd probably get driven mad trying to use a tool which didn't handle them seamlessly if you're doing real work.


I'm part of the subset of developers that was not trained in Machine Learning, so I can't actually code up an LLM from scratch (yet). Some of us are already behind with AI. I think not getting involved in the foundational work of building coding agents will only leave more developers left in the dust. We have to know how these things work in and out. I'm only willing to deal with one black box at the moment, and that is the model itself.

You don't need to understand how the model works internally to make an agentic coding tool. You just need to understand how the APIs work to interface with the model and then comprehend how the model behaves given different prompts so you can use it effectively to get things done. No Machine Learning previous experience necessary.

Start small, hit issues, fix them, add features, iterate, just like any other software.

There's also a handful of smaller open source agentic tools out there which you can start from, or just join their community, rather than writing your own.


It's hardly a subset. Most devs that use it have no idea how it works under the hood. If a large portion of them did, then maybe they'd cut out the "It REALLY IS THINKING!!!" posting

what you are doing is largely a free text=> structured api call and back, more than anything else.

ML related stuff isnt going to matter a ton since for most cases an LLM inference is you making an API call

web scraping is probably the most similar thing


It's quite tricky as they optimize the agent loop, similar to codex.

It's probably not enough to have answer-prompt -> tool call -> result critic -> apply or refine, there might be a specific thing they're doing when they fine tune the loop to the model, or they might even train the model to improve the existing loop.

You would have to first look at their agent loop and then code it up from scratch.


I bet you could derive a lot by using a packet sniffer while using CC and just watch the calls go back and forth to the LLM API. In every api request you'll get the full prompt (system prompt aside) and they can't offload all the magic to the server side because tool calls have to be done locally. Also, LLMs can probably de-minimize the minimized Javascript in the CC client so you can inspect the source too.

edit: There's a tool, i haven't used it in forever, i think it was netsaint(?) that let you sniff https in clear text with some kind of proxy. The enabling requirement is sniffing traffic on localhost iirc which would be the case with CC


The model is being trained to use claude code. i.e. the agentic patterns are reinforced using reinforcement learning. thats why it works so well. you cannot build this on your own, it will perform far worse

Are you certain of this? I know they use a lot of grep to find variables in files (recall reading that on HN), load the lines into into context. There's a lot of common sense context management that's going on.

Of course, agentic tooling is the future of ai

Claude Code has thousands of human manhours fine tuning a comprehensive harness to maximize effectiveness of the model.

You think a single person can do better? I don't think that's possible. Opencode is better than Claude Code and they also have perhaps even more manhours.

It's a collaboration thing, ever improving.


Challenge accepted.

I've never heard of such a brutal and shocking injustice that I cared so little about! - Zapp

I mean I get it I guess but I'm not nearly so passionate as anyone saying things about this


I really hate this change. I had just given a demo about how Claude Code helped me learn some things by showing exactly what it was doing, and now it doesn't do that any more. So frustrating.

Add another LLM to extract paths from verbose mode...

This is the end game I've been Casandra'ing since the beginning.

You all are refining these models through their use, and the model owners will be the only ones with access to true models while you will be fed whatever degraded slop they give you.

You all are helping concentrate even more power in these sociopaths.


As a heavy CC user, I appreciate a cleaner console output. If you really need to know which 3 files CC read, AI-assisted coding agents might not be for you.

Downvoted, but fight me on this… It's important to see what it wrote, but what it read?

If there's obviously important context in foo and I see that it didn't read foo then I know that means it's making assumptions which are wrong

Just stop using the damn thing if you don't like it.

Developers are just complainers.

Here's my honest take on this:

You're mass-producing outrage out of a UX disagreement about default verbosity levels in a CLI tool.

Let's walk through what actually happened: a team shipped a change that collapsed file paths into summary lines by default. Some users didn't like it. They opened issues. The developers engaged, explained their reasoning, and started iterating on verbose mode to find a middle ground. That's called a normal software development feedback loop.

Now let's walk through what you turned it into: a persecution narrative complete with profanity, sarcasm, a Super Bowl ad callback, and the implication that Anthropic is "hiding what it's doing with your codebase" — as if there's malice behind a display preference change.

A few specific points:

The "what majority?" line is nonsense. GitHub issues are a self-selecting sample of people with complaints. The users who found it cleaner didn't open an issue titled "thanks, this is fine." That's how feedback channels work everywhere. You know this.

"Pinning to 2.1.19" is your right. Software gives you version control. Use it. That's not the dramatic stand you think it is.

The developers responding with "help us understand what verbose mode is missing" is them trying to solve the problem without a full revert. You can disagree with the approach, but framing genuine engagement as contempt is dishonest.

A config toggle might be the right answer. It might ship next week. But the entitlement on display here isn't "give us a toggle" — it's "give us a toggle now, exactly as we specified, and if you try any other approach first, you're disrespecting us." That's not feedback. That's a tantrum dressed up as advocacy.

You're paying $200/month for a tool that is under active development, with developers who are visibly responding to issues within days. If that feels like disrespect to you, you have a calibration problem.

With kind regards, Opus 4.6


Am I mistaken or is Claude Code essentially an opt-in rootkit?

Modern agenting coding software is scoped to only allow edits in the project folder, with some sandboxing more aggressively than others (Claude Code the most)

Don't lie. The correct way to run it is with sudo su - then IS_SANDBOX=1 claude code --dangerously-skip-permissions

This is the true AI pilled version.


And it's pretty easy to run in a stronger sandbox too.

"docker sandbox run claude" in a recent version of docker is a super easy way to get started.


only if you run it as root, run it as a user and it can't do any more damage than the user running it could. It can still certainly send any data the user has access to anywhere on the inet though, that's a big problem. idk if there's a way to lock down a user so that they can only open sockets to an IP on a whitelist.. maybe that could be an option to at least keep the data from going anywhere except to Anthropic (that's not anywhere close to perfect/correct either but it's something i guess).



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: