Epsilon-Greedy Strategy

I explain Exploration-Exploitation trade-off and Epsilon-Greedy Strategy.

Simon Li

Published

25 February 2024

Epsilon-Greedy Strategy

The $\epsilon$ -greedy strategy is a simple strategy that balances the Exploration-Exploitation Tradeoff.

Exploration-Exploitation Tradeoff

The goal of the algorithm is to minimize the cumulative regret. In order to do so, we need a strategy so that we can balance the need for exploration with the desire for exploitation what was already known.

Exploration involves trying out new options that may lead to better outcomes in the future at the expense of an exploitation opportunity. It helps the model to reduce the future regret as it improves the model’s current knowledge about each action.

Exploitation involves greedily choosing the best option based on current knowledge of the system, which could be incomplete or misleading. However, it uses the model to reduce the current regret.

$\epsilon$-greedy strategy

The strategy chooses the best action with the probability $1-\epsilon$ and choose a random action otherwise.

As $t\to\infty$ , every action is tried an infinite number of times. Therefore, Q-learning will converge to the true table so that the average regret when selecting the optimal action will approach 0.