As one of the authors, I'd like to clarify: the equations of the RWKV model enable computational parallelization, provided that the sequence is predetermined. This parallelization occurs during both the training and inference stages, specifically during the prompt reading process (consider it an "encoding"), right before the generation (or decoding phase).