Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Using the smaller distilled versions. I'm running this one, which only needs 20GB of VRAM (or regular RAM on Apple Silicon): https://ollama.com/library/deepseek-r1:32b


Keep in mind the distilled versions are NOT shrunken versions of deepseek-r1 their just finetunes of Qwen and Llama i believe, and they are no where near as good as real r1 (the 400g version) or even the 133g quants.


Do we know how do these distilled versions perform in benchmarks?


DeepSeek published a bunch of benchmarks when they released the models: https://github.com/deepseek-ai/DeepSeek-R1?tab=readme-ov-fil...

I'd like to see detailed benchmarks run by other unaffiliated organizations.


This is very useful. Thank you.

so basically there is not much reason to go beyond DeepSeek-R1-Distill-Qwen-32B, at least for coding tasks


Just had a chance to play around with 32B model

https://glama.ai/models/deepseek-r1-distill-qwen-32b

I am using it with Cline VSCode extension to write code.

It works impressively well for a model this size.

Thanks again for sharing those benchmarks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: