Introduction to Markov Decision Process

I provide a brief introduction to MDPs.

Simon Li

General Markove Decision Process

Markov Decision Process (MDP) can be defined as a tuple

(S,A,p,μ,r,γ) (\mathcal{S}, \mathcal{A}, p, \mu, r, \gamma)

where

The policy of an agent can be repsented as π:S×A[0,1]\pi : \mathcal{S} \times \mathcal{A} \to [0,1], which is a mapping from a state to a probability distributino over actions.

Constrained Markov Decision Process

A *constrained MDP$ inherits the general MDP structure with an addition of constraint function c:S×ARc: \mathcal{S} \times \mathcal{A} \to \R and an episodic constraint threshold β\beta.