I don't think there are any VPSs that can do that in a way that is even remotely performant or a good value compared to something like an LLM inference provider or serverless GPU. I would look into together.ai and RunPod for that type of thing.
But let me know if you find something. I just don't think something tiny like phi-3 which could run on a VPS, although great for it's size, is at all comparable to this stuff in terms of ability.
True, you could run it at home on a server though.
My AI server takes about 60W idle and 300-350W while running a query in llama3. At a kWh price of 0.15€ that ends up at about 7-10€ a month if it's not loaded too heavily. Not bad IMO.
The server could be more energy optimized though. But that would cost me also.
Gemma 27b? Command R (32b) would be ideal middle grounds but won’t fit in a 16gb card. There are a handful of 12gb like the new mistral though. I doubt that 12b overs much improvement over a 7b to compare to 70b. Seems like an entirely different class.
You probably want to limit yourself if you do have a 16gb card because you still need to fit the context window in memory too.
About 100€ for the PC (some hardware was surplus) and 300€ for the GPU which was a nice 16gb model with HBM2. Was pretty nice for an educational project IMO. I much rather do something like this than spend money on a course.
Edit: I’m amazed by how offended some people are by such a simple question.