Still no 14/30b parameter models since llama 2. Seriously killing real usability...

luke-stanley · on Sept 25, 2024

The Llama 3.2 11B multimodal model is a bit less than 14B but smaller models can do more these days, and Meta are not the only ones making models. The 70B model has been pruned down by NVIDIA if I recall correctly. The 405B model also will be shrunk down and can presumably be used to strengthen smaller models. I'm not convinced by your shiny hat.

swader999 · on Sept 25, 2024

You don't need an F-15 to play at least, a decent sniper rifle will do. You can still practise even with a pellet gun. I'm running 70b models on my M2 max with 96 ram. Even larger models sort of work, although I haven't really put much time into anything above 70b.

int_19h · on Sept 25, 2024

With a 128Gb Mac, you can even run 405b at 1-bit quantization - it's large enough that even with the considerable quality drop that entails, it still appears to be smarter than 70b.

ComputerGuru · on Sept 26, 2024

Just to clarify, you are saying 1b-quantized 405b is smarter than 70b unquantized?

int_19h · on Sept 26, 2024

You need to quantize 70b to run it on that kind of hardware as well, since even float16 wouldn't fit. But 405b:IQ1_M seems to be smarter than 70b:Q4_K_M in my experiments (admittedly very limited because it's so slow).

Note that IQ1_M quants are not really "1-bit" despite the name. It's somewhere around 1.8bpw, which just happens to be enough to fit the model into 128Gb with some room for inference.

foxhop · on Sept 25, 2024

4090 has 24G

So we really need ~40B or G model (two cards) or like a ~20B with some room for context window.

5090 has ??G - still unreleased

regularfry · on Sept 26, 2024

Qwen2.5 has a 32B release, and quantised at q5_k_m it *just about" completely fills a 4090.

It's a good model, too.

kristianp · on Sept 26, 2024

Do you also need space for context on the card to get decent speed though?

regularfry · on Sept 27, 2024

Depends how much you need. Dropping to q4_k_m gives you 3GB back if that makes the difference.