Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For inference, but yes. Many hundreds of tokens per second of output is the norm, in my experience. I don't recall the prompt processing figures but I think it was somewhere in the low hundreds of tokens per second (so slightly slower than inference).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: