Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

IMO Google's Gemma2 27B [1] is the sweet spot for running locally on commodity 16GB VRAM cards.

[1] https://ollama.com/library/gemma2:27b



Keep in mind that Gemma is a larger model but it only has 8k context. The Mistral 12B will need less VRAM to store the weights but you'll need a much larger KV cache if you intend to use the full 128k context, especially if the KV is unquantized. Note sure if this new model has GQA but those without it absolutely eat memory when you increase the context size (looking at you Command R).


If I "only" have 16GB of ram on a macbook pro, would that still work ?


If it's an M-series one with "unified memory" (shared RAM between the CPU, GPU and NPU on the same chip), yes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: