The idea behind Reinforcement Learning (RL later) is fairly simple and intuitive. Let’s learn by interacting with what we are trying to master. One analogy that in my opinion explains the term in a good way is any kind of a puzzle (box puzzle in particular) that does not let you see its internal mechanism. There is no way to find out how it opens (without braking it) other than TRY to interact with it and see what happens.
To strike a point home let’s watch a short video that we’ll use as an explanation for numerous terms:
This example perfectly illustrates a RL system. There are few main elements in a system:
- System ENVIRONMENT is a space in which an agent operates. Puzzle is an environment.
- A RL AGENT (puzzle solver in this example) performs actions and observes “what happens”. In other word agent receives a reward signal.
- REWARD SIGNAL defines agent’s goal. We want to solve a puzzle. After each move agent performs environment reacts. This reaction is perceived by an agent; agent’s goal is to maximize total rewards and hopefully solve a puzzle. Reward signal tells an agent which moves are good and which are bad. Reward signals are perceived through agent’s sensors (eyes and hands in our example).
- POLICY is a way to get an action for a particular state of an environment. In other words, if you see a puzzle in a particular position and you know that next thing to do is to shake it. There will be a lot of discussion on policy later on. For now it should be stated that there are two types of actions. Exploratory – when an agent is trying actions that it hasn’t tried before (you see a puzzle in a particular state and instead of making a move that you’ve made before try a random move to see if it yields a better reward). And Greedy – when an agent acts according to policy.
- MODEL of the environment is a final component of a RL system. In our puzzle example model would be a complete knowledge of puzzle internal mechanism. Since we don’t have that knowledge current RL system would be called MODEL-FREE. Having an environment model allows an agent having a state and an action pair to predict resulting state (MODEL-BASED systems). In other word it allows to predict next reward for a given state-action pair, that leads nicely to solving planning problems. Planning problems require you to plan a course of action. In real world problems it is fairly rare to have a model of an environment.
I hope previous example gave you an intuition on Reinforcement Learning and its components. I believe that obtaining an intuition on fundamental level is important. Most of us don’t have a Phd in mathematics so taking things slow and allow them to sink in is the way to go if you want to survive in a long run, as concepts tend to compound. With all that being said, in the next part we will take a look at some basic math necessary to go any further into the field.
References: 2017 Richard S. Sutton and Andrew G. Barto “Reinforcement Learning:
An Introduction, second ed.”