MMORPGs PART 2: Applying data-driven design to optimize computation

In our previous post  Hadean’s spatial demo outlined the overarching design goals of placing computation where it is needed and to avoid wasteful, expensive resources like compute. This is vital to achieve any kind of scale and resolution at the same time as allocating servers at a uniform density becomes intractable for any medium scale/resolution. In particular, this is a major limiting factor in large scale games.

To remove this limitation we need to rethink how we apply computation to spatial problems. An integral part of this is the spatial datastructure. Currently, octrees are used to some success on largely single-machine or even single-threaded problems. However, they are difficult or even cumbersome to use at any larger scale. The necessary consistency and atomicity is hard to maintain. Implementing these while keeping the low latency and high throughput required for many problems can become a nightmare. Sometimes it becomes easier to write special cases and throw huge amounts of compute at the problem to solve it instead of taking an elegant approach.

Octrees accelerate spatial problems that need performance at massive scale, while maintaining the ability for high resolution in certain areas. It is the only structure that can reach the necessary speed for many access patterns. And we believe it will be the foundation of any attempt to solve spatial computing.

To this end, we’ve extended our spatial demo to allocate compute resources where they are needed to provide a clean interface over the complexity of managing spatial compute.

Our current architecture is a single, lightweight master process, which orchestrates a pool of nodes. The master stores and handles the distributed octree, so each cell needs to know nothing about the octree and very little overall – only what entities it contains, what area it is authoritative over, and where its neighbours are. The master is written with efficiency first, to enable a single master to handle hundreds or thousands of Hadean processes (which map to cells). The majority of the communication is directly between neighbours to allow full use of local bandwidth. Only when a change to the structure is needed does the master step in with an efficient rearranging of cells. The interface we provide on the cell is simple: a cell does some computation in an area, communicating to neighbours, and reporting its current load to the master. The simulation interface is even simpler, only needing to know about local points, foreign points and aggregates.

Hadean’s Spatial demo has made enormous progress in the last few weeks. Previously, we had separate simulations in each cell, no local neighbour-to-neighbour interaction and static, hard limits on the number and position of cells. Now, three weeks later, we have filled all of these gaps and implemented features that will change the nature of spatial computation including:

  • Dynamic Extension. This is the ability to spawn neighbour cells only when necessary. If handover is needed and the correct neighbour doesn’t exist yet then the master spawns up and connects a cell. Supporting this is important for not pre-defining the limits and boundaries of the space. The potential is there for unbounded, infinite scale whenever it is needed by a problem.
  • Variably Sized Neighbours. This enables having cell scales at vastly different orders of magnitude, all communicating seamlessly with their neighbours. Important for when the computation distribution in space is not uniform (in most interesting cases).
  • Pooling. To mitigate the latency of spawning new boxes and make spawning new cells almost instantaneous, we ready the cell processes in an elastic pool before deploying them when needed, with only milliseconds of latency. This removes any need for pre-predicting where neighbour cells will be needed, and allows us to just “spin” them up as needed by the simulation. This approach combined with dynamic neighbour requests (which enables dynamic extension, and handover) makes for a simple, yet powerful design. It gives us the ability to start new cells in less than 10ms.

In the next few weeks, we plan to further extend this demo with features that leverage this new dynamic computation.

  • Subdivision. One of our most important design goals is to put computation where it is needed. In the next iteration we’ll be adding the ability for cells which are overloaded to quickly split and divide the computation between their child cells. This will maintain an almost constant timestep even for very non-uniform compute distributions.
  • Reclaimation. Again, maintaining our design goal of efficiency, we’ll be adding the ability to despawn cells without any compute and merge cells will very small amounts of compute.
  • Aggregation. This feature will enable more complex behaviours in the simulation to be implemented much more efficiently. For long range interaction that can be approximated, it is very inefficient to communicate directly. Instead the approximation should be pushed up the octree and broadcast to the impacted cells. A use case for this would be a behaviour that needed to know the average position of all entities in the simulation. Instead of directly communicating to each neighbour cell, the cells will receive the average after the aggregation step.

Overall the codebase now supports multiple simulations, each of them with different access patterns and characteristics, but handled dynamically by the generic, efficient primitives provided by the Hadean octree. 

In part 3 we’ll be combining all of these low level innovations to present a flocking demonstration.  Check back next week to hear about our progress!

Read Part 1: Rethinking game architectures

Read Part 3: Parallel flocking behaviour in large-scale simulations

Download The Cloud-Native Game Design Handbook  to discover what cloud-native computing could do for your game design.