To train your machine, reward it


Reinforcement learning is a form of machine learning where a computer learns to complete a task by having repeated interaction with a dynamic environment. Through an iterative trial-and-error approach, the machine explores the environment. This exploration generates data, which is used by the machine to determine the best course of action to complete its job. This happens without human intervention and without having to programme the machine to perform a specific task.

Reinforcement learning differs from supervised machine learning in that in the latter, algorithms are built using data sets that contain the correct answer to a given problem. In reinforcement learning there is no answer – the machine has to find one by trying different courses of action and eventually selecting the one that gives the most reward with the least effort.

We could say that in the absence of answers, the machine learns through its own experience. The component that makes the decision of which action to take is known as ‘agent’.


How it works

Imagine that a dog in garden is given a tennis ball. The dog, which represents the agent, will first observe the garden and construct its representation of the environment. It will then wonder – what can I do with this ball? What happens if I throw it? Can I hide it? If so, where?

It will choose a course of action, such as hiding the ball, and observe how the owner responds. If the owner simply stares at the dog and doesn’t interact, the dog will find this dull, receiving a negative reward.

The dog will repeat the process until it realises that bringing the ball back to the owner will result in a smile and a treat, that is a positive reward. It will then understand that this action is the best one to maximise rewards.

Reinforcement learning algorithms encourage a machine to act in a similar way, interacting with a dynamic environment – for example a factory floor with several production lines – until it finds the most convenient way of proceeding.


Applications in manufacturing

In industrial manufacturing, reinforcement learning is used in processes where complex decision-making skills are required, especially where machines need to cope with changes in dynamic environments.

For example, a cobot can be trained to find the best path to avoid interferences, such as objects or the limbs of human workers, while continuing to perform its task. This would be simple for a human, but for machines it is an incredibly complex process that requires a careful analysis of an unpredictable environment. If successful, the cobot will be more productive, because it won’t need to stop to avoid impact.

Reinforcement learning can also be used to streamline production, an approach used by researchers at the Industrial AI Lab at Hitachi America. The researchers designed a virtual shop floor as a bidimensional matrix and used reinforcement learning algorithms to repeatedly interact with this virtual environment. By doing this, they were able to determine the best set up to increase productivity and reduce delays in servicing their customers.

Applications of reinforcement learning in manufacturing are just emerging, but the first experiments are already offering promising results. Industrial machines work hard to increase your productivity. It’s time to reward them.

To learn more about other types of machine learning and their applications in manufacturing, visit