Still no 14/30b parameter models since llama 2. Seriously killing real usability for power users/diy.
The 7/8B models are great for poc and moving to edge for minor use cases … but there’s a big and empty gap till 70b that most people can’t run.
The tin foil hat in me is saying this is the compromise the powers that be have agreed too. Basically being “open” but practically gimped for average joe techie. Basically arms control
The Llama 3.2 11B multimodal model is a bit less than 14B but smaller models can do more these days, and Meta are not the only ones making models.
The 70B model has been pruned down by NVIDIA if I recall correctly.
The 405B model also will be shrunk down and can presumably be used to strengthen smaller models. I'm not convinced by your shiny hat.
You don't need an F-15 to play at least, a decent sniper rifle will do. You can still practise even with a pellet gun. I'm running 70b models on my M2 max with 96 ram. Even larger models sort of work, although I haven't really put much time into anything above 70b.
With a 128Gb Mac, you can even run 405b at 1-bit quantization - it's large enough that even with the considerable quality drop that entails, it still appears to be smarter than 70b.
You need to quantize 70b to run it on that kind of hardware as well, since even float16 wouldn't fit. But 405b:IQ1_M seems to be smarter than 70b:Q4_K_M in my experiments (admittedly very limited because it's so slow).
Note that IQ1_M quants are not really "1-bit" despite the name. It's somewhere around 1.8bpw, which just happens to be enough to fit the model into 128Gb with some room for inference.
The 7/8B models are great for poc and moving to edge for minor use cases … but there’s a big and empty gap till 70b that most people can’t run.
The tin foil hat in me is saying this is the compromise the powers that be have agreed too. Basically being “open” but practically gimped for average joe techie. Basically arms control