Workshop
in Reinforcement Learning
(0368-3500-13)
Workshop project:
- The project will be done in groups of 2-3 students.
- Each group will implement a learning algorithm for a
board game.
- The background material needed would be covered
during the lectures.
- Requirements
document
Suggested Projects
More Challenging Projects
Workshop Outline
Week 1: Min Max
Trees
Week 2: Introduction
to Reinforcement Learning: Model and Planning.
Week 3: Reinforcement
learning: Learning (small state space)
Week 4: Reinforcement
learning: Learning (large state space)
Week 5: Simple
Graphics (GUI)
Teams and Games
1. Nir Nahum and Eran Gewurtz: Four in
a Row.
2. Uri Ravzin, Yuval Steinberg and
Zion Zatlawi: Four in a Row.
3. Yael Kagan, Max Shifrin and Mark Sandler: goMoku (Generalized X-O)
4. Noam Kovacs, Oren Solomianik
and Gennady Verdel: Breakthrough
5. Negev Nosatzki, Moran Shekel and Oshrit
Feder: Backgammon
n, Max Shifrin and
Mark Sendler: Generalized X-O
6. Daniel Rosenblatt and
Benny Trachtenbrot: Mancala
Sample Code
Basic Tic Toe
implemented in C++.
Basic Tic Toe
implemented in Java.
Bibliography [for background]
- A.G.
Barto and R.S., Reinforcement
Learning, MIT Press, 1998.
- Bertsekas,
D. P. and Tsitsiklis, J. N. (1996). Neural Dynamic Programming.
Athena Scientific, Belmont,
MA.
- Gardner (1981).
Samuel's checkers player. In Barr, A. and Feigenbaum, E. A., editors, The
Handbook of Artificial Intelligence, I, pages 84--108. William
Kaufmann, Los Altos, CA.
- Samuel,
A. L. (1967). Some studies in machine learning using the game of checkers.
II---Recent progress. IBM Journal on Research and Development,
pages 601--617.
- Tesauro,
G. J. (1994). TD--gammon, a self-teaching backgammon program, achieves
master-level play. Neural Computation, 6(2):215--219.
- Tesauro,
G. J. (1995). Temporal difference learning and TD-Gammon. Communications
of the ACM, 38:58--68.
- Tsitsiklis,
J. N. and Van Roy, B. (1996). Feature-based methods for large scale
dynamic programming. Machine Learning, 22:59--94.
Previous
Workshops: 1 2