Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

llama3.2:3b-instruct-q8_0 is performing better than 3.1 8b-q4 on my macbookpro M1. It's faster and the results are better. It answered a few riddles and thought experiments better despite being 3b vs 8b.

I just removed my install of 3.1-8b.

my ollama list is currently:

$ ollama list

NAME ID SIZE MODIFIED

llama3.2:3b-instruct-q8_0 e410b836fe61 3.4 GB 2 hours ago

gemma2:9b-instruct-q4_1 5bfc4cf059e2 6.0 GB 3 days ago

phi3.5:3.8b-mini-instruct-q8_0 8b50e8e1e216 4.1 GB 3 days ago

mxbai-embed-large:latest 468836162de7 669 MB 3 months ago



Aren't the _0 quantizations considered deprecated and _K_S or _K_M preferable?

https://github.com/ollama/ollama/issues/5425


For _K_S definitely not. We quantized 3b with q4_K_M since we were getting good results out of it. Officially Meta has only talked about quantization for 405b and hasn't given any actual guidance for what the "best" quantization should be for the smaller models. With The 1b model we didn't see good results with any of the 4b quantizations and went with q8_0 as the default.


For a second I read that as “it just removed my install of 3.1-8b” :D



On what basis do you use these different models?


mxbai is for embeddings for RAG.

The others are for text generation / instruction following, for various writing tasks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: