its not though if you're working in a massive codebase or on a distributed system that has many interconnected parts.
skills that teach the agent how to pipe data, build requests, trace them through a system and datasources, then update code based on those results are a step function improvement in development.
ai has fundamentally changed how productive i am working on a 10m line codebase, and i'd guess less than 5% of that is due to code gen thats intended to go to prod. Nearly all of it is the ability to rapidly build tools and toolchains to test and verify what i'm doing.
But... plain Claude does that. At least for my codebase, which is nowhere close to your 10m line. But we do processing on lots of data (~100TB) and Claude definitely builds one-off tools and scripts to analyze it, which works pretty great in my experience.
I think people are looking at skills the wrong way. It's not like it gives it some kind of superpowers it couldn't do otherwise. Ideally you'll have Claude write the skills anyway. It's just a shortcut so you don't have to keep rewriting a prompt all over again and/or have Claude keep figuring out how to do the same thing repeatedly. You can save lots of time, tokens and manual guidance by having well thought skills.
Some people use these to "larp" some kind of different job roles etc and I don't think that's productive use of skills unless the prompts are truly exceptional.
At work I use skills to maintain code consistency. We instrumented a solid "model view viewmodel" architecture for a front-end app, because without any guard rails it was doing redundant data fetching and type casts and just messy overall. Having a "mvvm" rule and skill that defines the boundaries keeps the llm from writing a bunch of nonsense code that happens to work.
Honestly I started with Obra superpowers and worked with my boss to brainstorm the best way to keep separation of concerns, and we just stepped on rakes as we developed and had Obra superpowers suggest updates to our rules/skills.
It's certainly an iterative process but it gets better every iteration.
Possibly, and we do use linters, but linters don't stop LLMs from going off the rails. It does end up fixing itself because of the linter, but then the results are only as good as the linter itself.
I have sometimes found "LARPing job roles" to be useful for expectations for the codebase.
Claude is kind of decent at doing "when in Rome" sort of stuff with your codebase, but it's nice to reinforce, and remind it how to deploy, what testing should be done before a PR, etc.
Even the most complex distributed systems can be understood with the context windows we have. Short of 1M+ loc, and even then you could use documentation to get a more succinct view of the whole thing.
This really doesn’t pan out in practice if you work a lot with these models
And also we know why: effective context depends on inout and task complexity. Our best guess right now is that we are often between 100k to 200k effective context length for frontier, 1m NIHS type models
all of apple’s devices with displays down to the watch run OS X with a form factor appropriate UI layer on top. iphone and mac are more unified than google’s android/chromeos
Tahoe made all the touch targets on macOS bigger, we may get a touch macbook pro this year.
i never really understood the billionaire yacht hate.
Once you buy a yacht 450 million dollars of ownership in a company you had goes to people who built a beautiful thing that exists in the real world and you're on the hook for employing a lot of people to maintain it.
I take a lot more issue with accumulation and hoarding of wealth than the spending of it.
An economy that wasted resources building mega yachts for billionaires is more unequal than one that builds cruise ships that high income families can go on an holiday.
> i never really understood the billionaire yacht hate.
Once someone reaches that level of fame and fortune it's almost a requirement if they want to travel or have some sort of 'vacation'. Don't get me wrong, it's definitely a great problem to have, but it's one of the only ways to find privacy at that level of wealth.
If I'm ever super wealthy, I hope I can also stay somewhat anonymous so that I can walk down the street like any other person.
Holding shares in a company (or dollar bills) is not depriving others of something. The fisherman will go catch fish tomorrow, the wheat in the fields will keep growing, the builder will build a house.
If someone starts paying the fisherman, farmer, builder, more to stop doing what they are doing and start building mega yachts, then there will be less fish, bread, and houses for others.
That said, I assume it's much simpler than that and it's just about the hypocrisy of the climate change billionaires to be bellowing out carbon while demanding the selfish greedy commoners cut our emissions.
Whilst this is true, there is some distortion to that statement with measuring by value. If I produce a screw for the US military (a scenario where supply chains are highly regulated and thus may be unable to buy cheap from a foreign country) and sell it for $1, I have produced a dollar of manufacturing by value, but If I produce exactly the same product in China for $0.1, I've only made 10 cents by value, despite the fact I have made exactly the same product.
There is a reason why for instance ships and raw materials output is measured in tonnage, since that is the actual thing produced, the value is secondary to that. That is you would want to measure the actual amount of goods produced rather than what they sold for, obviously only amongst comparable categories.
Also US unemployment has been low. The idea that Americans need more jobs just doesn't fit the numbers. There are plenty of good paying non-backbreaking jobs for Americans but they just don't seem to believe it.
you can get a job as a long haul truck driver in texas with no education and paid for training paying 80-100k and live in an area where houses cost 300k, within 5 years starting from zero with no education you can own a home and have a nest egg big enough to become an owner operator or invest in a small business.
so thats the floor for anyone willing to put in a few years of work.
Also, US manufacturing already struggles to find workers.
The problem, though, is that 70% of US manufacturing happens in small town/rural areas, which is not where the people looking for jobs are found, so you get this curious disconnect.
i think its just that its new year and year of the linux desktop is a meme (in the actual definition of the word kind of way) and the meme is growing over time
Xiaomi, Nvidia Nemotron, Minimax, lots of other smaller ones too. There are massive economic incentives to shrink models because they can be provided faster and at lower cost.
I think even with the money going in, there has to be some revenue supporting that development somewhere. And users are now looking at the cost. I have been using Anthropic Max for most of this year after checking out some of these other models, it is clearly overpriced (I would also say their moat of Claude Code has been breached). And Anthropic's API pricing is completely crazy when you use some of the paradigms that they suggest (agents/commands/etc) i.e. token usage is going up so efficient models are driving growth.
I haven't tried it yet, but yes. Qwen3 Next 80B works decently in my testing, and fast. I had mixed results with the new Nemotron, but it and the new Qwen models are both very fast to run.
Same experience: on my old M2 Mac with just 32B of memory both Qwen 3 30B and the new Nemotron models are very useful for coding if I prepare a one-shot prompt with directions and relevant code. I don’t like them for agentic coding tools. I have mentioned this elsewhere: it is deeply satisfying to mix local model use with commercial APIs and services.
skills that teach the agent how to pipe data, build requests, trace them through a system and datasources, then update code based on those results are a step function improvement in development.
ai has fundamentally changed how productive i am working on a 10m line codebase, and i'd guess less than 5% of that is due to code gen thats intended to go to prod. Nearly all of it is the ability to rapidly build tools and toolchains to test and verify what i'm doing.
reply