Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As one of the authors, I'd like to clarify: the equations of the RWKV model enable computational parallelization, provided that the sequence is predetermined. This parallelization occurs during both the training and inference stages, specifically during the prompt reading process (consider it an "encoding"), right before the generation (or decoding phase).


How can something recurrent be parallelized?

> the equations of the RWKV model enable computational parallelization, provided that the sequence is predetermined.

And sure, this is the core concept of self attention, no?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: