Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Minigo: An open-source implementation of the AlphaGo Zero algorithm (github.com/tensorflow)
206 points by brilee on Jan 29, 2018 | hide | past | favorite | 29 comments


To help put it all together, there is currently a book being written about Deep Learning and the game of Go. It is currently in MEAP at Manning [https://www.manning.com/books/deep-learning-and-the-game-of-...]

I am finding it quite useful. (I'm not the author, just a happy reader! :-))


Neat! I was hoping to see an idea of what strength the self-taught Minigo can play at, e.g. on KGS.


We've put it on KGS as "somebot", and it can play at somewhere from 3-5d level, although amusingly it still can't do ladders, and almost always plays 3-3. Our strength seems to have plateaued recently, and we suspect it's because of value net overfitting issues.


3-5d is about the strength of LeelaZero (which I presume you already knew about). Can you disclose how long have Minigo been training, how many games were generated and which hardwares were used?


About a month using K80s. We have something like a million games played.


Ladders sound hard for neural networks. IIRC the original AlphaGo had a ladder-solver hardcoded that would determine if a ladder position is winning or losing and feed this bit into the neural network.


I'm pretty sure it didn't, though I can't find it explicitly for Alpha Go. For Zero they specifically mention that it took the self playing a long time to learn ladders well.


I can confirm that the original 2015 Nature paper for AlphaGo mentions setting ladder capture / ladder escape bits as input to the neural network.

cf. https://gogameguru.com/i/2016/03/deepmind-mastering-go.pdf, Extended data table 2


So, um, what would you have to do to it to make it play at about 25 kyu? (I'm, um, asking for a friend...)


We're working on making all the data (models, selfplay SGFs) available to download - you could download an older generation.


puts "#{rand(19)}, #{rand(19)}"

(I'm not much better :)


I've been super out of touch with playing Go for the past couple years. I would love to get back into it with all these new development! Does anyone know if I can play against AlphaGo or derivative (such as MiniGo) online?


Various people have tried to incorporate AG-like techniques into their Go programs. One you might wish to play with is Leela Zero, which is low to mid dan (amateur) now.

To get this working:

* Acquire a GTP-capable GUI, such as Sabaki

* Acquire the latest Leela Zero release

* Acquire a recent Leela Zero neural net

* Set up Sabaki to use LZ with the net passed as an argument, e.g. "-t 1 -p 1600 --noponder --gtp -w d16fa4c3801e55ec21e0df7ead67980fe8d4ee49188a3516818207ad28b017a6"

It's a bit of work but nothing too hard. I should mention that this may require a semi-decent GPU (my old GTX 750 works fine).


Interesting, thanks!


Playing against an unbeatable computer is not a good way to learn... Same thing goes for chess.


That's the awesome thing about Go! If you play against someone who is better than you, you can place handicaps which makes the game fun from both sides.


Really cool! How much work would it be to get this to work for chess?


Check out https://github.com/glinscott/leela-chess. We are getting close to kicking off the distributed version now that we have validated it's possible to get a strong network through supervised training.

A nice win against Gnuchess (a very weak opponent, but nonetheless :) - https://github.com/glinscott/leela-chess/issues/47#issuecomm...


> A nice win against Gnuchess

Not sure why you guys don't show the PGN, but here you are:

1. e4 Nc6 2. Nf3 Nf6 3. Nc3 d5 4. exd5 Nxd5 5. Bb5 Nf4 6. O-O Bf5 7. d4 Nd5 8. Ne5 Qd6 9. Nxd5 Qxd5 10. Bxc6+ bxc6 11. c4 Qd6 12. Qf3 g6 13. Nxc6 Bg7 14. Bf4 Qe6 15. Rfe1 Qxc4 16. Rxe7+ Kf8 17. Rae1 Kg8 18. b3 Qc2 19. Qd5 Be6 20. R7xe6 fxe6 21. Qxe6+ Kf8 22. Bh6 Bxh6 23. Qf6+ Kg8 24. Ne7#

Question : when you switch to self-play reinforcement learning, do you plan on starting from the networked obtained in supervised learning or tabula rasa? I understand starting from tabula rasa will require more comptuting power/time, but if you start from the supervised learning network, isn't there a risk you inherit human biases in the game style? It would also defeat the purpose of having the system discover existing chess theory and possibly new one.


We are going to start tabula rasa. The supervised learning is meant to prove that there are no major bugs in the framework/learning process.

Should be fun to watch it learn chess theory :).


You'd have to write a Chess implementation that very carefully respects all possible game-ending pathways, a translation of a chess board into an array that a NN could understand, and a schema by which to flatten the array of all possible moves (both legal and illegal) into a single vector. Then, the MCTS and RL portions would be identical.


Sounds like this meets the requirements: https://github.com/niklasf/python-chess

> very carefully respects all possible game-ending pathways

Not sure exactly what this one means but it implements a is-game-over method, so you could do MCTS with it.

> a translation of a chess board into an array that a NN could understand

Chess positions are represented by sets of 64-bit integers so I don’t think this would be a blocker.

> and a schema by which to flatten the array of all possible moves (both legal and illegal) into a single vector

The list of legal moves is a set of bitboards as well.


Not quite. python-chess is a wonderful library, and you would use it to do the things that your parent said. But python-chess on its own doesn't do those things.


Has anyone tried using Minigo with KSP (Kerbal Space Program)?


Can you run it with the LeelaZero weights?


With a lot of elbow grease, mayyyyybe. We're currently using 128 filters, 20 residual layers, whereas LZ is using a smaller network size. Our networks would have had to define their batchnorm layers exactly the same way for the model files to be compatible, and it would also require a lot of op renaming for the model load operation to work correctly.


A lot of computation has already gone into training LeelaZero, so it might make a good baseline.


does it come with a pre trained model/s?


Here's one of the stronger models we've gone through.

https://github.com/tensorflow/minigo/releases/tag/v199




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: