RL Part 4.1 Dynamic Programming. Iterative Policy Evaluation.

So far in the series we’ve got an intuitive idea about what RL is, we described the system using Markov Reward Process and Markov Decision Process. We know what the policy is, what the optimal state and action value functions are. We’ve seen Bellman Optimality Equation that helped as to define the optimal action value function as a recursive function. BUT, we haven’t done anything to actually solve Markov Decision Process – find an optimal action value function, that would tell as the best actions at a given state. In this post we will explore Dynamic Programming approach to do just that. TLDR: just how me the code.

Continue Reading…

RL part 3. Markov Decision Process, policy, Bellman Optimality Equation.

Recall that in part 2 we introduced a notion of a Markov Reward Process which is really a building block since our agent was not able to take actions. It was simply transitioning from one state to another along with our environment. That’s not really helpful since we want our agent to not only take actions but also be able to pick actions. In order to do so we will introduce a set of actions to MRP to promote it to a Markov Decision Process, term that is actually used in Reinforcement Learning.

Continue Reading…

RL. part 2. Markov Reward Process.

In Part 1 we found out what is Reinforcement Learning and basic aspects of it. Probably the most important among them is the notion of an environment. Environment is the part of RL system that our RL agent interacts with. An agent makes an action, an environment reacts and an agent observes a feedback from an action. This circle of events creates a process. In this post we’ll try to mathematically formalize (using Markov property) and describe an environment and a process in the simple terms.

Continue Reading…

RL. part 0.Absolute minimum amount of math, necessary for studying Reinforcement Learning.

     Math is an absolute must have for anyone trying to learn Reinforcement Learning techniques. Writing any king of RL program requires precise understanding of the algorithms and underlying math. It will make your life easier, otherwise things will not work and the agent will not learn the way you expect it to and there will a lot of hair pulling and head clearing walks just to realize that you should have read that paper more closely and tried to see why authors decided to put so many equation in.

Continue Reading…

RL. part 1. What is Reinforcement Learning? Intuition.

    The idea behind Reinforcement Learning (RL later) is fairly simple and intuitive. Let’s learn by interacting with what we are trying to master. One analogy that in my opinion explains the term in a good way is any kind of a puzzle (box puzzle in particular) that does not let you see its internal mechanism. There is no way to find out how it opens (without braking it) other than TRY to interact with it and see what happens. 

Continue Reading…

How to port legacy functionality to newer code; workflow.

  If you ever needed to make a legacy code work with the newer one you might have noticed that it is a bit of a challenge; especially if you are not the author of neither version and original creator is no longer available to be constantly harassed and followed with questions about why things the way they are.

Continue Reading…

Few reasons why Emacs. Part 1.


When I say to people that I use Emacs, I get “the look”.  

Continue reading “Few reasons why Emacs. Part 1.”

What to do when you want to learn so much but there is so little time to do it?


    It is not easy to be a geek. There is all that knowledge to be obtained, and all the interesting projects to do. So what do you do when you have only few hours a day to do what you want.

Keep on Reading!