Simulation Environments for Reinforcement Learning on the Hadean Distributed Cloud Platform

Summary

Hadean has been working with Microsoft Bonsai to integrate their reinforcement learning (RL) system with our technology, by teaching Bonsai brains running on Microsoft Azure to learn in multiple simulation environments running on the Hadean Distributed Cloud Platform.

Defence
4 min read

Bonsai works with digital twins of real-world systems, learning to control them to drive towards a goal predefined by a human subject matter expert. The idea is that Bonsai learns an optimal control policy using the digital twin, and then can be deployed on the real-world version of that system as a controlling “Bonsai brain” agent.

Reinforcement learning systems learn in a way that’s quite familiar to humans: by trial and error. But whereas a human has some understanding of what they’re trying to do, and can often learn a task after a few attempts, RL agents begin with no fundamental understanding of the environment , and no past experience they can bring to bear. The outcome they’re driving towards must  be mathematically encoded in some way, without relying on the qualitative understanding that a human learner would have.

Perhaps it’s not surprising, then, that a RL agent can take hundreds of thousands – or even millions – of attempts to learn  optimal decision-making given the goals and a learning environment. Typically, this involves getting an agent to repeat a task over and over again in many episodes.

Clearly, then, the problem is that the training can take a very long time. One thing that can be done is to run many episodes in parallel and allow Bonsai to learn from each of them. Bonsai’s learning service knows how many parallel episodes are useful for learning, and this number can vary at different times.

Hadean has developed an integration with Bonsai, so that simulators written on the Hadean Distributed Cloud Platform can be used as the digital twins from which the RL agents learn how to drive towards their goal. Hadean’s platform can scale according to the number of simulators that are required by the current requirement for parallel episodes, ensuring both speed, and efficient use of computational resources. If individual simulators require large amounts of computational power, they can be placed on large cloud machines, or spread across multiple machines, as necessary.

For example, we used this system to show how a Bonsai brain can learn to operate a digital twin of a supply chain, so that stock-out delays through the supply chain are minimised. The simulation proceeds for a number of (simulated) days, and on each day, the Bonsai brain adjusts stock policies at locations in the chain, attempting to minimise undesirable outcomes, such as delays to customers. This may be balanced against other goals, such as reducing the total working capital employed in the chain, or remaining robust to low-probability, high-impact events. Many copies of this digital twin were spun up in Hadean Processes on the Platform, ensuring that the RL system has the scale it needs to explore strategies quickly, and thus the information it needs to complete its training.

The system learned to efficiently control the supply chain, exactly as we had hoped.

Whilst this is an exciting demonstration of reinforcement learning on Hadean Distributed Cloud Platform, we have our eyes firmly on what comes next. Because of the computational power that Hadean can bring to a simulator, we believe that there’s an opportunity to create a broader class of simulators which are sufficiently fast-running to be practical targets for RL.

The supply chain simulator was written as a Hadean Distributed Cloud Platform application, but the Hadean ecosystem enables further possibilities. For example, our spatial compute engine, Aether, can simulate virtual worlds with record-breaking numbers of entities. Placing reinforcement learning agents within Aether Engine simulations would enable them to learn in incredibly complex environments. These would mimic the complexity of the real world more closely, particularly in cases where emergent behaviour from many interacting individuals becomes important to capture. Worlds with many autonomous vehicles, or drones, for example, may be explored in this manner. Trained Bonsai brains could then be deployed to control entities within these virtual worlds, or transitioned to the real world to control a real object.

More generally, we’re very excited about the power of machine learning on the Hadean Distributed Cloud Platform, and there’s a whole world of machine learning libraries and applications out there that we’re exploring. We recently released our Hadean Distributed Cloud Platform SDK, which utilises Rust. You can read more about it here. We know that the Rust community is actively expanding its machine learning capabilities, and we believe that Rust-native ML on a Rust-native, scalable, distributed cloud computing platform is something both new and very powerful. For example, Rust-native ML inference can run 25x faster than the Python alternative. This makes it appealing for both experimentation and model deployment, in the cloud or even in edge devices.

Sign up for the SDK to find out what else our scalable, distributed cloud computing platform can enable!