The Most Ridiculous Tic-Tac-Toe Machine
The story of one dude who one day decided to teach 304 matchboxes how to play Tic-Tac-Toe.
Thanks to our sponsors who keep this newsletter free to everyone.
This week’s issue is brought to you by the Mathspp Insider 🐍 🚀 newsletter. Join over 16,000 programmers who want to take their Python 🐍 skills to the next level 🚀 and start writing elegant and efficient Python code. Click here to subscribe.
And by Patrick Loeber’s Substack, a monthly newsletter featuring tutorials, news, trends, and educational nuggets related to Python & AI. Click here to subscribe for free.
In 1961, a British researcher, Donald Michie, designed the most impressive analog Reinforcement Learning system ever created. He called it “MENACE,” short for “Matchbox Educable Noughts and Crosses Engine.”
Back then, accessing computers was difficult, so Michie used 304 matchboxes to design a solution that learned how to play tic-tac-toe from scratch.
Surprisingly, the matchboxes became unbeatable after a few training rounds without explicit instructions about the game.
The system was simple.
Michie had one matchbox for each board state in the game and added colored beads with different shapes inside. Every space on the board corresponded to one of these colored shapes.
When it was MENACE’s turn, an assistant randomly drew a bead from the matchbox. The color of the bead decided where to play.
For example, if the assistant drew a green bead, MENACE’s marker was placed on the upper-right corner of the board. If the assistant drew a red square, the bead was placed in the left-most middle section of the board.
Michie’s revolutionary idea was to adjust the contents of the matchboxes at the end of each game.
He had only three rules:
First, he would discard every bead played during a game where MENACE lost. This rule made it less likely for MENACE to play a losing move in the future.
Second, he would return any beads played with two additional beads of the same color to their matchboxes during a game where MENANCE won. This rule made it more likely for the system to repeat the moves that led to the win.
Finally, the third rule was to return every bead to its matchboxes after a draw.
While MENACE was equally likely to play any move initially, it quickly started making some plays more often than others.
MENACE started learning!
Here is a chart tracking the number of beads in the first matchbox over 140 games between MENACE and a human player.
At first, the number of beads decreases, indicating that MENACE loses most games, but starting at game 30 or so, the number of beads starts increasing! MENACE improves its odds of winning as it plays more.
More importantly: after a hundred games or so, MENACE becomes unbeatable!
304 matchboxes. No computers. Way before Reinforcement Learning was a thing. One of the most incredible examples I’ve seen, if you ask me.
Alright, I want to add $50,000 to your salary
I launched the Machine Learning School community.
It starts with a 9-hour live cohort where you’ll learn how to train, tune, deploy, and monitor machine learning models using AWS.
This is about Machine Learning in the real world. I’m not theoretical, and I’m not interested in showing you papers and talks. This community is about making shit work.
The first cohort starts on April 17. You can join here.
But this is just the beginning.
You pay once to join the Machine Learning School and get lifetime access to everything I bring to the community. Every class, course, and talk will be available until the end of time.
You’ll never have to pay again. No recurrent payments. Ever.
Early-bird pricing until the end of March.
“Noughts and Crosses” is the Commonwealth English version of the American Tic-tac-toe.