In section 1 of TD learning we have seen a new class of algorithms that can learn online after every step. In other words TD can learn before and without the final outcome using Bootstrapping - the idea of updating a guess towards a guess. In particular, we've looked at the TD(0), an algorithms that… Continue reading Part 5.4 Model-Free Prediction: Temporal-Difference Learning, section 2 TD(λ).
Our first algorithms of a totally different class. Temporal-Difference (TD), just like Monte-Carlo method, learns directly from an experience of interacting with an environment. TD is model-free, does not require any knowledge of MDP transactions or rewards. No need to worry about how different things affect our state values. There is, also no more need… Continue reading Part 5.3 Model-Free Prediction: Temporal-Difference Learning, section 1 TD(0)
If you've been following along with the series, you might start to wonder "What do we do if we want to solve Markov Decision Process (MDP) but don't know how environment operates?" In other word, we don't have a model of our environment, but our agent still wants to predict the best way to act.… Continue reading Part 5.1. Model-Free prediction: Monte-Carlo method.