Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

True, you could run it at home on a server though.

My AI server takes about 60W idle and 300-350W while running a query in llama3. At a kWh price of 0.15€ that ends up at about 7-10€ a month if it's not loaded too heavily. Not bad IMO.

The server could be more energy optimized though. But that would cost me also.



Llama3 8b is in no way equivalent to Gemini Flash or gpt-4o mini.


I'm really missing a middle-ground llama3 model. Llama 2 had a 12b but Llama3 is 8 or 70. 12 or 13 would be pretty ideal for a 16gb card :(


Gemma 27b? Command R (32b) would be ideal middle grounds but won’t fit in a 16gb card. There are a handful of 12gb like the new mistral though. I doubt that 12b overs much improvement over a 7b to compare to 70b. Seems like an entirely different class.

You probably want to limit yourself if you do have a 16gb card because you still need to fit the context window in memory too.


I tried gemma2 but it has to be quantified to hell to fit in a 16gb card and then it performs a lot worse than llama3 8b unforutnately.

I hope the new mistral comes out soon for ollama.

True about the context window but with llama3 this was not a problem as it has such a small context window anyway.


Mistral just released a totally open 12b model


Yeah I saw! I hope it comes to ollama soon.


Mini is 82% MMLU and 8b is 68%


How much did the server hardware cost?


About 100€ for the PC (some hardware was surplus) and 300€ for the GPU which was a nice 16gb model with HBM2. Was pretty nice for an educational project IMO. I much rather do something like this than spend money on a course.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: