We've put it on KGS as "somebot", and it can play at somewhere from 3-5d level, although amusingly it still can't do ladders, and almost always plays 3-3. Our strength seems to have plateaued recently, and we suspect it's because of value net overfitting issues.
3-5d is about the strength of LeelaZero (which I presume you already knew about). Can you disclose how long have Minigo been training, how many games were generated and which hardwares were used?
Ladders sound hard for neural networks. IIRC the original AlphaGo had a ladder-solver hardcoded that would determine if a ladder position is winning or losing and feed this bit into the neural network.
I'm pretty sure it didn't, though I can't find it explicitly for Alpha Go. For Zero they specifically mention that it took the self playing a long time to learn ladders well.
I've been super out of touch with playing Go for the past couple years. I would love to get back into it with all these new development! Does anyone know if I can play against AlphaGo or derivative (such as MiniGo) online?
Various people have tried to incorporate AG-like techniques into their Go programs. One you might wish to play with is Leela Zero, which is low to mid dan (amateur) now.
To get this working:
* Acquire a GTP-capable GUI, such as Sabaki
* Acquire the latest Leela Zero release
* Acquire a recent Leela Zero neural net
* Set up Sabaki to use LZ with the net passed as an argument, e.g. "-t 1 -p 1600 --noponder --gtp -w d16fa4c3801e55ec21e0df7ead67980fe8d4ee49188a3516818207ad28b017a6"
It's a bit of work but nothing too hard. I should mention that this may require a semi-decent GPU (my old GTX 750 works fine).
That's the awesome thing about Go! If you play against someone who is better than you, you can place handicaps which makes the game fun from both sides.
Check out https://github.com/glinscott/leela-chess. We are getting close to kicking off the distributed version now that we have validated it's possible to get a strong network through supervised training.
Question : when you switch to self-play reinforcement learning, do you plan on starting from the networked obtained in supervised learning or tabula rasa? I understand starting from tabula rasa will require more comptuting power/time, but if you start from the supervised learning network, isn't there a risk you inherit human biases in the game style? It would also defeat the purpose of having the system discover existing chess theory and possibly new one.
You'd have to write a Chess implementation that very carefully respects all possible game-ending pathways, a translation of a chess board into an array that a NN could understand, and a schema by which to flatten the array of all possible moves (both legal and illegal) into a single vector. Then, the MCTS and RL portions would be identical.
Not quite. python-chess is a wonderful library, and you would use it to do the things that your parent said. But python-chess on its own doesn't do those things.
With a lot of elbow grease, mayyyyybe. We're currently using 128 filters, 20 residual layers, whereas LZ is using a smaller network size. Our networks would have had to define their batchnorm layers exactly the same way for the model files to be compatible, and it would also require a lot of op renaming for the model load operation to work correctly.
I am finding it quite useful. (I'm not the author, just a happy reader! :-))