I wonder how this compares to running an ollama server on a vps. Edit: I’m amaze...

ilaksh · on July 18, 2024

I don't think there are any VPSs that can do that in a way that is even remotely performant or a good value compared to something like an LLM inference provider or serverless GPU. I would look into together.ai and RunPod for that type of thing.

But let me know if you find something. I just don't think something tiny like phi-3 which could run on a VPS, although great for it's size, is at all comparable to this stuff in terms of ability.

wkat4242 · on July 18, 2024

True, you could run it at home on a server though.

My AI server takes about 60W idle and 300-350W while running a query in llama3. At a kWh price of 0.15€ that ends up at about 7-10€ a month if it's not loaded too heavily. Not bad IMO.

The server could be more energy optimized though. But that would cost me also.

ilaksh · on July 18, 2024

Llama3 8b is in no way equivalent to Gemini Flash or gpt-4o mini.

wkat4242 · on July 18, 2024

I'm really missing a middle-ground llama3 model. Llama 2 had a 12b but Llama3 is 8 or 70. 12 or 13 would be pretty ideal for a 16gb card :(

vineyardmike · on July 19, 2024

Gemma 27b? Command R (32b) would be ideal middle grounds but won’t fit in a 16gb card. There are a handful of 12gb like the new mistral though. I doubt that 12b overs much improvement over a 7b to compare to 70b. Seems like an entirely different class.

You probably want to limit yourself if you do have a 16gb card because you still need to fit the context window in memory too.

wkat4242 · on July 19, 2024

I tried gemma2 but it has to be quantified to hell to fit in a 16gb card and then it performs a lot worse than llama3 8b unforutnately.

I hope the new mistral comes out soon for ollama.

True about the context window but with llama3 this was not a problem as it has such a small context window anyway.

ilaksh · on July 18, 2024

Mistral just released a totally open 12b model

wkat4242 · on July 19, 2024

Yeah I saw! I hope it comes to ollama soon.

ilaksh · on July 18, 2024

Mini is 82% MMLU and 8b is 68%

j5155 · on July 18, 2024

How much did the server hardware cost?

wkat4242 · on July 18, 2024

About 100€ for the PC (some hardware was surplus) and 300€ for the GPU which was a nice 16gb model with HBM2. Was pretty nice for an educational project IMO. I much rather do something like this than spend money on a course.