Adding Intelligence to Service Discovery

by Hadean Platform

Service discovery identifies which IP address a client should connect to. In a typical distributed architecture, a key value store such as etcd acts as a single source of truth and stores the addresses, redirecting any requests. However, in highly dynamic environments, such as entity based simulations, this is an inefficient approach; a general unsynchronised data structure can be slow to update and even cause you to attempt to connect to the wrong server.

Service Discovery and Entity Based Simulations

Simulations where millions or billions of entities interact with one another quickly run into O(N^2) complexity, which consumes huge amounts of processing power. A method of simplifying this is to split the simulation across a number of cells, grouping together by physical locality and therefore interaction probability.

However, there are a number of scenarios that involve communication at a longer range, across multiple cells, thereby requiring each one to keep etcd updated; and all regions must then be queried in order to determine which cell to engage. Problematically, as the number of cells and requests grows, so does the amount of data and work on both the client and server side. Moreover, data might become outdated by the time you use it, which requires you to write code very defensively.

Unfortunately, it’s hard to get around this with a general key value store. This can only be achieved if there is a way to query a region of space directly, as well as synchronise with the rest of the simulation. The benefits of this approach are:

  • You can amortise requests! Batch several regions into one request, potentially getting information about just one cell back
  • The serving side can have an efficient spatial data structure in place for mapping virtual space to cpu space
  • The serving side can build in an understanding of when it will be queried and when it’s ok to apply updated to cell regions
  • The consuming side can make nice (easy to think about) assumptions regarding the validity of the processes it communicates with or the data it receives

It’s also worth noting that given how quickly and rapidly entity based simulations can shift from one virtual space to another, having a lightweight reallocation mechanism in place is crucial to the efficiency of the simulation.

The Aether Engine Manager

At Hadean, we have addressed this issue through a manager which mediates between the different cells, handling hundreds and thousands of different processes. The manager is responsible for both service discovery and the structure of the cells within a simulation. 

A cell carries out computation in a particular area, communicating to its neighbours and reporting its current load to the manager, which will then restructure the cells if required. Consequently, the manager is never out of date and will always return the right address. Furthermore, when the simulation undergoes a restructure, spatial allocation happens atomically, so service discovery is guaranteed to return exactly correct data. This benefits the entire system including entity transfer, messaging and communication.

Ultimately, service discovery forms a core part of any distributed topology. Adding orchestrating intelligence and special purpose tooling provides a fast lightweight alternative to traditional approaches and helps you reduce the unnecessary architectural complexity often found in large scale dynamic environments.