True, you could run it at home on a server though. My AI server takes about 60W ...

ilaksh · on July 18, 2024

Llama3 8b is in no way equivalent to Gemini Flash or gpt-4o mini.

wkat4242 · on July 18, 2024

I'm really missing a middle-ground llama3 model. Llama 2 had a 12b but Llama3 is 8 or 70. 12 or 13 would be pretty ideal for a 16gb card :(

vineyardmike · on July 19, 2024

Gemma 27b? Command R (32b) would be ideal middle grounds but won’t fit in a 16gb card. There are a handful of 12gb like the new mistral though. I doubt that 12b overs much improvement over a 7b to compare to 70b. Seems like an entirely different class.

You probably want to limit yourself if you do have a 16gb card because you still need to fit the context window in memory too.

wkat4242 · on July 19, 2024

I tried gemma2 but it has to be quantified to hell to fit in a 16gb card and then it performs a lot worse than llama3 8b unforutnately.

I hope the new mistral comes out soon for ollama.

True about the context window but with llama3 this was not a problem as it has such a small context window anyway.

ilaksh · on July 18, 2024

Mistral just released a totally open 12b model

wkat4242 · on July 19, 2024

Yeah I saw! I hope it comes to ollama soon.

ilaksh · on July 18, 2024

Mini is 82% MMLU and 8b is 68%

j5155 · on July 18, 2024

How much did the server hardware cost?

wkat4242 · on July 18, 2024

About 100€ for the PC (some hardware was surplus) and 300€ for the GPU which was a nice 16gb model with HBM2. Was pretty nice for an educational project IMO. I much rather do something like this than spend money on a course.