Perceptron Networks - 1: Did I Win?
Recently I embarked upon a journey of making the IRC bot play tic tac toe. The idea seemed interesting at first, but my mind being the cat it is, started wandering off. After a long tussle, a side quest and three espresso shots later I finally had the cat under control, and the patience to resume with the journey I had embarked upon.
I had given the tic tac toe problem a try before too - as part of a school project. The result were not that great though, as I had to practically hand run all possible moves, and hand code the winning IF conditions. The computer was practically a god in it. It never lost, and the moment it realised it was about to lose, it would crash.
The current endeavour was no different, except, I could now utilise some of the more advanced technologies, compared to the ones which I had ( or was capable of, with my limited skill set ) some 12 years ago.
I did spend some hours pondering on the approach, and finally zeroed on one. Wrote the code, trashed it, re-wrote the code, trashed it again, then focused my energy on a completely different angle to the problem. All this while I was trying to train my machine to learn the game, I had neglected the core logic of the game. Checking if some one had actually won the game
What if I could get rid of the logical conditions and the cumbersome loops (neatly tucked inside functions) to classify the end game result as a winning or a loosing game.
Not as appealing and glamorous as designing the game itself, but, still the intriguing nature of the problem made me give it a shot.
With the training and testing data separated out, it was time to write the core learning algorithm for the Perceptron network.
I had given the tic tac toe problem a try before too - as part of a school project. The result were not that great though, as I had to practically hand run all possible moves, and hand code the winning IF conditions. The computer was practically a god in it. It never lost, and the moment it realised it was about to lose, it would crash.
The current endeavour was no different, except, I could now utilise some of the more advanced technologies, compared to the ones which I had ( or was capable of, with my limited skill set ) some 12 years ago.
if((x[0] == -1 && x[1] == -1 || X[2] == -1 && x[6] == -1 || ... many conditions later ... || x[0] == -1 && x[8] == -1) && x[4] == 0) { x[4] = 1; // the not so winning move. }
I did spend some hours pondering on the approach, and finally zeroed on one. Wrote the code, trashed it, re-wrote the code, trashed it again, then focused my energy on a completely different angle to the problem. All this while I was trying to train my machine to learn the game, I had neglected the core logic of the game. Checking if some one had actually won the game
function checkIfWon() { return (
checkRows(gameBoard)
|| checkCols(gameBoard)
|| checkDiagonals(gameBoard)
); }
What if I could get rid of the logical conditions and the cumbersome loops (neatly tucked inside functions) to classify the end game result as a winning or a loosing game.
Not as appealing and glamorous as designing the game itself, but, still the intriguing nature of the problem made me give it a shot.
The Dataset
The dataset was easy to obtain. Tic-Tac-Toe Endgame has a comprehensive list of end game combinations to train a model on.
Data Cleaning
b,b,b,o,b,o,x,x,x,positive b,b,b,b,o,o,x,x,x,positive x,x,o,x,x,o,o,b,o,negative x,x,o,x,x,o,b,o,o,negative x,x,o,x,x,b,o,o,o,negative
The sample data looked something like this, and yes I could have just used a machine learning toolkit to train the input data. Since I was planning to write my own training code to do the feat, I had to convert all 'x' to 1, 'b' to 0 and 'o' to -1. Also I had planned to convert the positive to 1 and negative to 0.
With the data transformed to a more manageable numerical format, I started with the training part.
import pandas as pd ttt = pd.read_csv('~/tic-tac-toe.data.csv', names = [1,2,3,4,5,6,7,8,9,'output']) rplc = {'x':1, 'o':-1, 'b':0, 'positive':1, 'negative':0} cleaned_ttt = ttt.replace({y:rplc for y in list(ttt)})
With the data transformed to a more manageable numerical format, I started with the training part.
Training the Model
I choose Perceptron networks, a rather easier machine learning algorithm to train the model, partly because my end states were binary in nature ( winning, or losing) and partly because it was the easiest available algorithm to implement.
As any good machine learning lesson would teach, the first step was dividing the data into training set and testing set, and going by the general rule of thumb, I followed the 80 - 20 rule.
train=cleaned_ttt.sample(frac=0.8,random_state=200) test=cleaned_ttt.drop(train.index)
With the training and testing data separated out, it was time to write the core learning algorithm for the Perceptron network.
Initial Weights
The Perceptron networks works by assigning weights to each input feature, which in this case is the tic-tac-toe cell. The algorithm is seeded with random weights, which would get adjusted in each iteration of the algorithm.
import random weights = pd.Series({y: random.randint(0, 5) for y in list(ttt) if y != 'output'})
I seeded the weight with random int values between 0 and 5.
Training Function
The training function consisted of two parts:
- Weight calculation
- Weight re-adjustment
for index, row in train.iterrows(): input_row = row[range(0,9)] result = 1 if (sum(input_row*weights)) > 0 else 0 weights = weights + (row['output'] - result)*input_row
Testing Function
No machine learning algorithm is complete without the testing phase. The 20% of the input data came handy for testing the accuracy of the model.
tmp = (test[range(1,10)]*weights).sum(axis=1) test.loc[tmp <= 0, 'result'] = 0 test.loc[tmp > 0, 'result'] = 1 true_positive = len(test[(test['output'] == 1) & (test['result'] == 1)]) false_positive = len(test[(test['output'] == 0) & (test['result'] == 1)]) true_negative = len(test[(test['output'] == 0) & (test['result'] == 0)]) false_negative = len(test[(test['output'] == 1) & (test['result'] == 0)]) acc = (true_positive + true_negative)/float(len(test)) prc = true_positive/float(true_positive + false_positive) recall = true_positive/float(true_positive + false_negative)
Model Statistics
Prediction Win Lose Win 121 7 Lose 20 44
Accuracy 85.94% Precision 85.82% Recall 94.53%
weights
cell 1 7 2 6 3 8 4 4 5 7 6 4 7 6 8 6 9 7 dtype: int64
Conclusion
Comparing the machine learning model with that of the exhaustive if conditions in this particular context makes me realise that the conditional branching would have been better, where I could achieve 100% precision, but again, my decision on deciding whether the computer has won the game or not, has now been turned into a dot product between the weight matrix and the tic-tac-toe board with a precision of ~86%
Comments
Post a Comment