Home
About me
Research
Publications
Expository Notes
Blogs

Introduction to Markov Decision Process

I provide a brief introduction to MDPs.

Simon Li

Published

10 March 2024

General Markove Decision Process

Markov Decision Process (MDP) can be defined as a tuple

$(\mathcal{S}, \mathcal{A}, p, \mu, r, \gamma)$

where

$\mathcal{S}$ is the state space
$\mathcal{A}$ is the action space
$p(\cdot|s,a)$ is the transition probabilities over the next state given the current state $s$ and current action $a$
$r: \mathcal{S} \times \mathcal{A} \to \R$ is the reward function
$\mu : \mathcal S \to [0,1]$ is the initial state distribution
$\gamma$ is the discount factor for future rewards

The policy of an agent can be repsented as $\pi : \mathcal{S} \times \mathcal{A} \to [0,1]$ , which is a mapping from a state to a probability distributino over actions.

Constrained Markov Decision Process

A *constrained MDP$ inherits the general MDP structure with an addition of constraint function $c: \mathcal{S} \times \mathcal{A} \to \R$ and an episodic constraint threshold $\beta$ .