Introduction to Markov Decision Process
I provide a brief introduction to MDPs.
Simon Li
General Markove Decision Process
Markov Decision Process (MDP) can be defined as a tuple
where
- is the state space
- is the action space
- is the transition probabilities over the next state given the current state and current action
- is the reward function
- is the initial state distribution
- is the discount factor for future rewards
The policy of an agent can be repsented as , which is a mapping from a state to a probability distributino over actions.
Constrained Markov Decision Process
A *constrained MDP$ inherits the general MDP structure with an addition of constraint function and an episodic constraint threshold .