Part 5.2 Model-Free Prediction: fundamentals of online algorithms.

Ever wondered how online algorithms came to be the thing? There is a clear path from Monte-Carlo method to the non-stationary online algorithms that rock our world nowadays. It seems trivial, but does help with understanding how different pieces fit together.

Part 5.1. Model-Free prediction: Monte-Carlo method.

If you've been following along with the series, you might start to wonder "What do we do if we want to solve Markov Decision Process (MDP) but don't know how environment operates?" In other word, we don't have a model of our environment, but our agent still wants to predict the best way to act.… Continue reading Part 5.1. Model-Free prediction: Monte-Carlo method.

RL Part 4.2 Policy Iteration.

This is a continuation of my attempt to learn basics of Reinforcement Learning. I took a short, so deserved break, and now ready to continue. In previous post as I remember we went over dynamic programming and discovered our first algorithm to evaluate a given fixed policy Iterative Policy Evaluation. Which is a good start, but does… Continue reading RL Part 4.2 Policy Iteration.

Why you should work on large projects while you are in college?

I've been through a lot of interviews right after finishing university. Phone screens, screen sharing coding exercises, whiteboard problem solving (my favorite), day long tasks, 4 hour challenges, you name it I've done it. Eventually I got the job, but what followed really struck me as strange. I've solved all problems on HakerRank. I'm surely… Continue reading Why you should work on large projects while you are in college?

RL Part 4.1 Dynamic Programming. Iterative Policy Evaluation.

So far in the series we've got an intuitive idea about what RL is, we described the system using Markov Reward Process and Markov Decision Process. We know what the policy is, what the optimal state and action value functions are. We've seen Bellman Optimality Equation that helped as to define the optimal action value… Continue reading RL Part 4.1 Dynamic Programming. Iterative Policy Evaluation.

RL part 3. Markov Decision Process, policy, Bellman Optimality Equation.

Recall that in part 2 we introduced a notion of a Markov Reward Process which is really a building block since our agent was not able to take actions. It was simply transitioning from one state to another along with our environment. That's not really helpful since we want our agent to not only take… Continue reading RL part 3. Markov Decision Process, policy, Bellman Optimality Equation.

RL. part 2. Markov Reward Process.

In Part 1 we found out what is Reinforcement Learning and basic aspects of it. Probably the most important among them is the notion of an environment. Environment is the part of RL system that our RL agent interacts with. An agent makes an action, an environment reacts and an agent observes a feedback from… Continue reading RL. part 2. Markov Reward Process.

RL. part 0.Absolute minimum amount of math, necessary for studying Reinforcement Learning.

     Math is an absolute must have for anyone trying to learn Reinforcement Learning techniques. Writing any king of RL program requires precise understanding of the algorithms and underlying math. It will make your life easier, otherwise things will not work and the agent will not learn the way you expect it to and… Continue reading RL. part 0.Absolute minimum amount of math, necessary for studying Reinforcement Learning.

RL. part 1. What is Reinforcement Learning? Intuition.

    The idea behind Reinforcement Learning (RL later) is fairly simple and intuitive. Let's learn by interacting with what we are trying to master. One analogy that in my opinion explains the term in a good way is any kind of a puzzle (box puzzle in particular) that does not let you see its… Continue reading RL. part 1. What is Reinforcement Learning? Intuition.

How to port legacy functionality to newer code; workflow.

  If you ever needed to make a legacy code work with the newer one you might have noticed that it is a bit of a challenge; especially if you are not the author of neither version and original creator is no longer available to be constantly harassed and followed with questions about why things the… Continue reading How to port legacy functionality to newer code; workflow.