For inference, but yes. Many hundreds of tokens per second of output is the norm...

		LoganDark 72 days ago \| parent \| context \| favorite \| on: Nvidia to buy assets from Groq for $20B cash For inference, but yes. Many hundreds of tokens per second of output is the norm, in my experience. I don't recall the prompt processing figures but I think it was somewhere in the low hundreds of tokens per second (so slightly slower than inference).