Asp Forum - [NON-SOLUTION] Learning Tic-Tac-Toe (#11

Hans Fugal

12/13/2004 6:07:00 PM

Ok, I hope I don't become infamous for non-solutions, but I like to
share what didn't work because that can be almost as instructive as what
did work.

As TTT is a trivial and largely uninteresting game (as has been
discussed), I was more interested in the journey than the result. I
decided to brush up on my Q-Learning and so coded up a Q learner AI. I
then pitted my q learner against itself and let it go for several hours.
(I didn't have the time to observe it closely or the inclination to
implement some reporting to make pretty graphs or whatnot) I expected
that when I came back it would be drawing most of the time with itself,
but it was not. It seemed to be playing more or less decent games, with
each one winning roughly half the time, but very rarely would they draw.
I think perhaps they were taking more risks than they ought to have, or
perhaps I had my constant of randomness on their decisions too high.

Of course it's also entirely possible I have an implementation bug. :)

The other possiblity that I'm afraid of is a theoretical bug. I'm not
entirely sure q-learning is appropriate for this solution the way I
implemented it. My concern is that the way I implemented it it might not
be a true MDP, because the state transition is not deterministic. (The
reward/next state pair is taken from when it's our turn again, and what
the other player does is nondeterministic)

I left the laptop home and turned off, but I'd be happy to send the code
later if anyone's interested in seeing it. :-)

2 Answers

James Gray

12/13/2004 6:49:00 PM

On Dec 13, 2004, at 12:07 PM, Hans Fugal wrote:

> Ok, I hope I don't become infamous for non-solutions, but I like to
> share what didn't work because that can be almost as instructive as
> what did work.

I too ran into road blocks. I tried to recreate Donald Michie's famous
"matchbox machine", MENACE, in Ruby. Unfortunately, i tripped myself
up with my own cleverness.

I was optimizing the amount of positions the machine had to consider
(see my earlier message about rotation and mirroring). My
implementation was flawed though and failed to shift the weights for
possible moves accordingly. This lead my AI to try bogus moves.

There's also a more subtle flaw in my matchbox/bead logic, affecting
the machines ability to learn from it's mistakes. I was in the process
of tracking that down when I began running into the above problem.

Both of these problems are fixable, of course, but I ran out of time
this weekend. In the end, I blame myself. I over complicated it and
could have kept things easier on myself. Live and learn, as the saying
goes.

James Edward Gray II

Roman W

10/23/2012 8:17:00 AM

W dniu poniedzialek, 22 pazdziernika 2012 10:19:45 UTC+1 uzytkownik slawek napisal:
> Stabilnosci algorytmu nie stwierdza sie przez glosowanie czy podzielanie
> opinii.

Mowa byla o wydajnosci i wymaganiach pamieciowych implementacji, a nie stabilnosci algorytmu. Opinie uzytkownikow maja tutaj duzo do rzeczy, bo to
uzytkownicy wiedza, jakei dane wejsciowe laduja do implementacji.

RW

comp.lang.ruby

[NON-SOLUTION] Learning Tic-Tac-Toe (#11

Hans Fugal

James Gray

Roman W

x Login to ForumsZone