Part 5.4 Model-Free Prediction: Temporal-Difference Learning, section 2 TD(λ).

In section 1 of TD learning we have seen a new class of algorithms that can learn online after every step. In other words TD can learn before and without the final outcome using Bootstrapping - the idea of updating a guess towards a guess. In particular, we've looked at the TD(0), an algorithms that… Continue reading Part 5.4 Model-Free Prediction: Temporal-Difference Learning, section 2 TD(λ).