Reinforcement Learning Workshop Project
Authors Eli Libman Shanee Lavi Nur Lan
The reinforcement learning workshop project was to develop an automated, self-learning player for the game
Four in a Row ("Connect Four"). Various learning algorithms were taught during lectures, of which we eventually chose to implement the SARSA algorithm whose inner workings is discussed in the specification document (PDF).
"Four in a Row" is a two-player board game which usually consists of a matrix of 7 column and 6 rows. For this workshop, our version of the game was set as 8 columns over 6 rows. Each player has a color - black or red - and a set of discs of matching colors.
Game course: The players play turn by turn. In each turn, a player chooses a column, at the top of which he inserts a disc of his color. Game End: Each player tries to complete a streak of four discs of his own color - either in a horizontal, vertical or diagonal manner. First player to complete this - wins. A tie is reached when the board fills with no winning patterns. We implemented an epsilon-greedy Q-learning algorithm of the SARSA variety. Learning the Q function instead of the environment is useful in an adversarial setting like Four In a Row as it allows us not to study the opponent (i.e. the environment) but the utility of the states instead.
Because of the asymmetry between game roles (first/second player) we decided to learn the Q function separately for either role.
Learning the Q function directly required holding a table between each possible
The linear function approximation receives a vector of board features (see next section) representing the current state and returns some linear combination of these features. The learning process goal is to finds the locally optimal weights of this linear combination.
SARSA learning algorithm, as its name implies, requires a
Our update rule was: .
Where:
,
and . If If
The linear approximation was made using the following game features:
This way we achieve a properly decaying learning rate.
Our final player was trained using the following technique:
Allis, V. A Knowledge-Based Approach of Connect-Four - The Game Is Solved: White Wins. Report IR-163 by the Faculty of Mathematics and Computer Science at the Vrije Universiteit Amsterdam, The Netherlands. 1988. Also Report CS 92-04 by the Faculty of General Sciences at the University of Limburg, Maastricht, The Netherlands, 1992.[pdf]
Gardner (1981). Samuel's checkers player. In Barr, A. and Feigenbaum, E. A., editors, The Handbook of Artificial Intelligence, I, pages 84--108. William Kaufmann, Los Altos, CA. [link] |
|