Part 5.3 Model-Free Prediction: Temporal-Difference Learning, section 1.

Our first algorithms of a totally different class. Temporal-Difference (TD), just like Monte-Carlo method, learns directly from an experience of interacting with an environment. TD is model-free, does not require any knowledge of MDP transactions or rewards. No need to worry about how different things affect our state values. There is, also no more need… Continue reading Part 5.3 Model-Free Prediction: Temporal-Difference Learning, section 1.

Part 5.2 Model-Free Prediction: fundamentals of online algorithms.

Ever wondered how online algorithms came to be the thing? There is a clear path from Monte-Carlo method to the non-stationary online algorithms that rock our world nowadays. It seems trivial, but does help with understanding how different pieces fit together.

Part 5.1. Model-Free prediction: Monte-Carlo method.

If you've been following along with the series, you might start to wonder "What do we do if we want to solve Markov Decision Process (MDP) but don't know how environment operates?" In other word, we don't have a model of our environment, but our agent still wants to predict the best way to act.… Continue reading Part 5.1. Model-Free prediction: Monte-Carlo method.