llama3.2:3b-instruct-q8_0 is performing better than 3.1 8b-q4 on my macbookpro M...

PhilippGille · on Sept 26, 2024

Aren't the _0 quantizations considered deprecated and _K_S or _K_M preferable?

https://github.com/ollama/ollama/issues/5425

Patrick_Devine · on Sept 26, 2024

For _K_S definitely not. We quantized 3b with q4_K_M since we were getting good results out of it. Officially Meta has only talked about quantization for 405b and hasn't given any actual guidance for what the "best" quantization should be for the smaller models. With The 1b model we didn't see good results with any of the 4b quantizations and went with q8_0 as the default.

taneq · on Sept 25, 2024

For a second I read that as “it just removed my install of 3.1-8b” :D

fragmede · on Sept 26, 2024

https://github.com/KillianLucas/open-interpreter/

aryehof · on Sept 26, 2024

On what basis do you use these different models?

kingkongjaffa · on Sept 26, 2024

mxbai is for embeddings for RAG.

The others are for text generation / instruction following, for various writing tasks.