Reinforcement Learning
Outline
Goal of Reinforcement Learning
Reinforcement Learning - origins
Typical Applications
Contrast with Supervised Learning
Mathematical Model - Motivation
Mathematical Model - MDP
MDP model - states and actions
MDP model - rewards
MDP model - trajectories
MDP model - Episodic tasks
Simple example: N- armed bandit
MDP - Return function.
MDP model - return functions
MDP model - action selection
MDP model - summary
Planning - Basic Problems.
Planning - Value Functions
Planning - Policy Evaluation
Algorithms - Policy Evaluation Example
Algorithms -Policy Evaluation Example
Algorithms - optimal control
Algorithms -Optimal control Example
Algorithms -optimal control Example
Algorithms - computing optimal policy
Algorithms - Linear Programming
Algorithms - Value Iterations
Algorithms - Policy Iterations
Algorithms - Open problems
Learning Algorithms
Learning - Model Based
Learning - Model freeMonte Carlo - Policy Evaluation
Monte Carlo - optimal control.
Learning - Model FreeTemporal Differences -Policy evaluation
TD(0) - Optimal Control
Comparing TD and MC
Learning - Open Problems
Planning versus Learning
Example - Elevator Control
Current Research Efforts
Large Scale MDP
Large scale MDP
Large scale MDP - Restricted Value Function
TD-gammon
Function Approximation - basics
Function Approximation - Linear
Linear Function Approximation policy evaluation
Linear Function Approximation optimal control
Function Approximation - Conclusion
Large scale MDP - Restricted policies
Large scale MDP - Restricted models
Large Scale MDP - Generative Model.
Large Scale MDPs- research
Partially Observable MDP
POMDP - Belief State Algorithm
POMDP - Hard problems.
Summary
Resources
שקופית PPT
דואר אלקטרוני: mansour@cs.tau.ac.il
דף בית: http://www.math.tau.ac.il/~mansour
מידע נוסף: Reinforecement Learning