Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Alas, it doesn't appear to work well for longer contexts:

https://twitter.com/arankomatsuzaki/status/16390003799784038...

Has anyone here experimented with this recently to confirm?



It has already been confirmed that with the right dataset we can scale it effectively from 2k to 4K, and 4K to 8k via fine tuning (you dun even need to train a new foundation model)

We believe this can be done for 16k to way beyond 100k

Research in how RWKV handle the hidden state shows that it is barely used (imo: <5%??) meaning lots of headroom for scaling context size

(This is actively being experimented on - we dun really know the limit yet)


Thank you. This is great to hear!

I'm going to take a closer look :-)


one of the authors here!, I think someone in our discord did experiments to prove that it does work for longer contexts, The pace of this work moves really fast. This might have been an earlier models in the series. RWKV it needs to be trained for longer contexts lengths in order to obtain that skill a context tuning if you will. IRCC there will be a follow up paper for it.


Thank you for taking the time to comment here!

That sounds promising. Maybe scale (of model and training samples) is all you need.

And RNNs are obviously so much more efficient at inference.

I'm going to take a closer look :-)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: