Look into groq.com guys. some good models at similar speed to inception labs

sujayk_33 · 2025-05-01T09:14:48 1746090888

It's faster inference because of the Hardware (LPUs), here the question is about architectures (AR or Diffusions)

ZeroTalent · 2025-05-04T08:39:42 1746347982

I realize that, but it can be used now with many models in real-life situations. I just wanted to mention it if someone doesn't know it.

rfv6723 · 2025-05-01T09:39:16 1746092356

SRAM doesn't scale with advanced semiconductor node.

Groq is heading to a dead end.