Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> And around ~2012, a bunch of researchers have reported you don't even need 2nd-derivative information. You just have to initialize the neural net properly.

This sounds very interesting. How do you property initialize the weights? Do you have a link to a paper about this?



Check out this paper:

Practical recommendations for gradient-based training of deep architectures, Y. Bengio

http://arxiv.org/abs/1206.5533

There is a section on weight initialization on page 15. In general, this paper has a lot of good information in one place.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: