As one of the authors, I'd like to clarify: the equations of the RWKV model enab... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		matteogrella on May 23, 2023 \| parent \| context \| favorite \| on: RWKV: Reinventing RNNs for the Transformer Era As one of the authors, I'd like to clarify: the equations of the RWKV model enable computational parallelization, provided that the sequence is predetermined. This parallelization occurs during both the training and inference stages, specifically during the prompt reading process (consider it an "encoding"), right before the generation (or decoding phase).

RC_ITR on May 25, 2023 [–]

How can something recurrent be parallelized?

> the equations of the RWKV model enable computational parallelization, provided that the sequence is predetermined.

And sure, this is the core concept of self attention, no?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact