Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is it only possible to have success with paid versions of these LLMs?

Google's "Ask AI" and ChatGPT's free models seem to be consistently bad to the point where I've mostly stopped using them.

I've lost track of how many times it was like "yes, you're right, I've looked at the code you've linked and I see it is using a newer version than what I had access to. I've thoroughly scanned it and here's the final solution that works".

And then the solution fails because it references a flag or option that doesn't even exist. Not even in the old or new version, a complete hallucination.

It also seems like the more context it has, the worse it becomes and it starts blending in previous solutions that you explained didn't work already that are organized slightly different in the code but does the wrong thing.

This happens to me almost every time I use it. I couldn't imagine paying for these results, it would be a huge waste of money and time.

 help



It depends.

Google's AI that gloms on to search is not particularly good for programming. I don't use any OpenAI stuff but talking to those that do, their models are not good for programming compared to equivalent ones from Anthropic or google.

I have good success with free gemini used either via the web UI or with aider. That can handle some simple software dev. The new qwen3.5 is pretty good considering its size, though multi-$k of local GPU is not exactly "free".

But, this also all depends on the experience level of the developer. If you are gonna vibe code, you'll likely need to use a paid model to achieve results even close to what an experienced developer can achieve with lesser models (or their own brain).


Set up mmap properly and you can evaluate small/medium MoE models (such as the recent A3B from Qwen) on most ordinary hardware, they'll just be very slow. But if you're willing to wait you can get a feel for their real capabilities, then invest in what it takes to make them usable. (Usually running them on OpenRouter will be cheaper than trying to invest in your own homelab: even if you're literally running them on a 24/7 basis, the break even point compared to a third-party service is too unrealistic.)

Subjectively, but with tests using identical prompts, I find the quality of qwen3.5 122b below claude haiku by as much as claude haiku is below claude sonnet for software design planning tasks. I have yet to try a like-for-like test on coding.

> But, this also all depends on the experience level of the developer. If you are gonna vibe code,

Where I find it struggles is when I prompt it with things like this:

> I'm using the latest version of Walker (app launcher on Linux) on Arch Linux from the AUR, here is a shell script I wrote to generate a dynamic dmenu based menu which gets sent in as input to walker. This is working perfectly but now I want to display this menu in 2 columns instead of 1. I want these to be real columns, not string padding single columns because I want to individually select them. Walker supports multi-column menus based on the symbol menu using multiple columns. What would I need to change to do this? For clarity, I only want this specific custom menu to be multi-column not all menus. Make the smallest change possible or if this strategy is not compatible with this feature, provide an example on how to do it in other ways.

This is something I tried hacking on for an hour yesterday and it led me down rabbit hole after rabbit hole of incorrect information, commands that didn't exist, flags that didn't exist and so on.

I also sometimes have oddball problems I want to solve where I know awk or jq can do it pretty cleanly but I don't really know the syntax off the top of my head. It fails so many times here. Once in a while it will work but it involves dozens of prompts and getting a lot of responses from it like "oh, you're right, I know xyz exists, sorry for not providing that earlier".

I get no value from it if I know the space of the problem at a very good level because then I'd write it unassisted. This is coming at things from the perspective of having ~20 years of general programming experience.

Most of the problems I give it are 1 off standalone scripts that are ~100-200 lines or less. I would have thought this is the best case scenario for it because it doesn't need to know anything beyond the scope of that. There's no elaborate project structure or context involving many files / abstractions.

I don't think I'm cut out for using AI because if I paid for it and it didn't provide me the solution I was asking for then I would expect a refund in the same way if I bought a hammer from the store and the hammer turned into spaghetti when I tried to use it, that's not what I bought it for.


What LLM are you using? What you describe should be no problem for gemini free or claude haiku and above. Other models, I dunno.

Both ChatGPT's anonymous one as well as Google's "AI mode" on their search page which brings you to a dedicated page to start prompting. I'm not sure if that's Gemini proper because if I goto https://gemini.google.com/app it doesn't have my history.

I personally didn't get good results until I got the $100/mo claude plan (and still often hit $180/mo from spending extra credits)

It's not that the model is better than the cheaper plans, but experimenting with and revising prompts takes dozens of iterations for me, and I'm often multiple dollars in when I realize I need to restart with a better plan.

It also takes time and experimentation to get a good feel for context management, which costs money.


I bought the $200 plan so after my extras started routinely exceeding that. Harsh.

But, let me suggest that you stop thinking about planning and design as "prompts". I work with it to figure out what I want to do and have it write a spec.md. Then I work with it to figure out the implementation strategy and have it write implementation.md. Then I tell it I am going to give those docs to a new instance and ask it to write all the context it will need with instructions about the files and have it write handoff.md.

By giving up on the paradigm of prompts, I turned my focus to the application and that has been very productive for me.

Good luck.


plan.md / implementation.md is just a prompt.

You're not telling me to do anything different.


Yes, unfortunately the free version of Claude, Gemini or ChatGPT coding models can't compare with the paid ones, and are just not that useful. But, there are alternatives like GLM and Grok that can be quite useful, depending on the task.

PS: The cheapest still very useful alternative I've found is GitHub's Copilot at €10/m base price, with multiple models included. If you pick manually between cheap models for low complexity and save Opus 4.6 for specific things, you can keep it under budget.

At least from what I’ve seen, yes you do have to pay for anything useful. But just the cheaper plans seem worth the price.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: