Knowledge Base
Aether Engine
Worker Communication
Data Synchronization scalability: Is there any limit or what is the limit per worker on the message/data update per iteration that can be sent and received?
There is no defined limit and dependent on the size of messages and requirements on guarantees and latency.
Data Synchronization determinism: Is order of message respected from the same worker and from multiple workers?
The developer controls how to react to received messages, so they can reorder them into desired order after receiving them.
Is reception order of messages and ECS changes from one worker guaranteed?
See above.
Is reception order of messages and ECS changes from multiple worker guaranteed?
See above.
Could a worker be interested in all entities and components (assuming we can load balance this worker)?
When you load libraries and capabilities those capabilities get distributed and available across all workers. Each worker is automatically distributed and shares the load and manages that load based on the cell and entities within its respective cell.
Dispatcher
Do you provide any mechanism of synchronization between the workers?
All workers work in ticks, and at the end of each phase of the tick there is synchronization barrier between all workers.
In order to stay deterministic, each iteration (tick) can only start when all it inputs (outputs of each workers from the previous tick) have been received?
Yes, this is also how Aether Engine works.
Architecture
The multi-tenant nature of the cloud can lead to performance interferences. This is a concern in high performance computing. Does Hadean address this in any way?
Hadean removes much of the complexity/bloat of middleware and other 'cloud native' libraries required to build such apps today. However, can you state exactly which functions you abstract away from the typical development process? For example, I understand Docker / K8S is no longer needed but what else? Are these libraries actually used within Hadean and hidden from the dev?
- Provisioning – replacing hand-rolled terraform scripts with application-reactive dynamic provisioning within the OS itself
- Containerisation – replacing Docker and Kubernetes with single binaries
- Service discovery – replacing manually maintained etcd/zookeeper/consul clusters
- with first-class process IDs for built-in globally unique addressing of system components
- Tracing – replacing manual usage of tracing tools (like Jaeger) to a first-class integration with immediately accessible visualisation and insights
- Logging – replacing custom provisioning of fluentd on every machine with default log handling (and configurability for advanced use-cases)
How does Hadean maintain performance at scale?
For multi-cluster applications, latency and bandwidth awareness becomes key – we have plans in this area to create latency-aware data structures (and have begun the design process for some that apply to Aether Engine) to bring the same data structure-based thinking to multi-cluster operation. We’re still actively collecting and investigating use-cases that have strong multi-cluster needs.
Today CPUs are workable with Hadean but what is the work required to make it possible to use GPUs, FPGAs, etc? For example, in high frequency trading FPGAs are widely used. For financial institutions Monte Carlo simulations are particularly interesting. How do they achieve this today? Is this running on-prem with GPUs?
We are enthusiastic about the future where Hadean programs run in heterogeneous computing environments, and the scheduler is able to appropriately schedule to and run on specialised computing devices. Once we concretely identify a commercial application that clearly demands GPU or FPGA-based computing our aim is to add GPU and FPGA support to the Hadean implementation.
In the case of Monte Carlo simulations, our implementation in question uses an optimised off-the-shelf C++ Monte Carlo library used in the capital markets industry. We have not encountered pushback on this. Perhaps because the scale we are running at is enormous and our actual competitor in the space is Apache Spark running on Java rather than some optimised GPU-based solution. GPU-based solutions tend to be competition in the small data/big compute space. The specific examples we are currently going after are big data where GPU based solutions tend not to be contenders due to a rather different set of problems. When compared to the Java-based Spark solution the C++ Hadean approach has significant advantages, in-particular being able to utilise the vectorisation capabilities (AVX/AVX2 instructions) of modern x86-64 CPUs. If performance becomes a concern in our customer conversations then we would certainly be interested in providing a version of Mesh where the Monte Carlo solvers are executed on GPUs instead.
What are the requirements to run server-side code? (i.e. will all game engines run directly, or do we need to write unique server-side code)
How do you handle if a worker becomes a hotspot, do you split or something?
When does it split?
Do I have control over the logic behind split-and-merge?
Does your multiplayer platform support physics? Do you support deterministic physics across servers?
Runtime Architecture
When a new worker is created at runtime, what happen to the rest of the simulation?
The simulation will continue to run as expected without disruptions. The Aether architecture is designed for dynamic number of workers. If any area covered by other workers get taken by the new worker, then those entities are handed over to the new worker.
How is the latency handled while the worker is bootstrapping?
Bootstrapping happens asynchronously, Aether keeps a buffer pool of ‘ready’ workers for instant use so bootstrapping doesn’t become a bottleneck.
Costs
When handling large volumes of data, the transportation costs involved in using the cloud become noteworthy. How does Hadean consider this?
Who pays for the cloud?
Interconnect
A major issue for HPC workloads today is ensuring the interconnect between machines is optimised. How does Hadean add to the interconnect, if at all, for workloads?
- A rigorously-sound general model for communications and computing on arbitrary hardware architectures and innovations
- Take advantage of being close to the metal without unnecessary layers of abstraction and indirection
We do nothing particular to improve hardware interconnect, but where better interconnect, or indeed memory, processing etc. are available, Hadean intends to leverage this. A practical example is how we ran the world’s first global real-time game during our public Eve Aether Wars tests. Hadean made use of the underlying Cloud and Edge data centre layouts and Azure’s “Accelerated Networking” capability to run a single game simulation, distributed across 7 data centre regions, and played by users connecting from 122 different countries.
Partnership
Are you open to co-developing IP as a partner?
Are you open to providing our team the code, existing demos, and support necessary to evaluate the platform and developer workflow?
- Alignment on problem area and technical/commercial outcomes
- Project scoping (timelines, resource requirements)
- Customer workshop (enabling your team to get up to speed for their specific project requirements)
Our early access SDK will provide:
- Access to download our SDK and Engine
- An image to run locally in a Hyper-V environment
- The ability to deploy into a customer-owned cloud environment (at present, Azure)
- Example tutorial combined with supporting documentation on how to build and create a prototype game
Platform
Is it containerized? Can we do local IO? What type of user acccount/profile is the worker running under?
Hadean doesn’t use containers or orchestration tools such as Kubernetes as the workers are deployed and managed as part of the Hadean Platform and its distributed process model. Local IO is possible, but may not be suitable depending on use-case. Processes currently run under a non-admin user with sudo access, this will be locked down in time.
SDK
Based on documentation, Aether Engine provides geo/space based reference load balancing. Would that fit well if we have a mix of ground entities and space assets such as satellites?
Yes, this would be an octree for 3D allocations of cells and workers.
Do you provide API so we can implement custom load balancing?
Developers can write functions that will manage based on the simulation when to split/join Aether cells. Also the “coordinates” used for load balancing can be separate from real entity coordinates, which provides multiple options for implementing unique methods to distribute them other than by geo.
What's the event flow when an entity migrate from one worker to another?
When entities move from one worker or from a cell to another there is an authority handover between them. For example, entities running in a neighboring cell can be ghosted or in other words a neighbor cell can see what is running in an adjacent cell. This allows for deterministic simulations and a radius can be implemented on each entity type to handover authority when on the boarder of a neighboring cell. This section in the documentation goes into more detail.
What happen if an entity position is not covered by any worker?
Entity positions are always covered by a worker which is correlated to a cell within the simulation. If entity moves to a position that’s not covered by any workers then Aether will either spawn a new worker covering that area, or resize a neighbouring worker to cover it.
At what stage of the simulation the entities are being created?
This is dependent on the simulation. In most cases entities are spawned upon startup of the simulation however as we use an ECS (Entity Component System) projectiles are also a simulation and can be created based on an action. Entities can be created at will every tick by the simulation code, or at the startup of whole simulation.
Is the SDK single or multi-threaded?
Hadean model is a process model, where each compute unit is a separate Hadean process. Hadean achieves concurrency via the Hadean Process Model rather than via POSIX threads. This is because the Hadean Process Model provides a concurrency model that is agnostic of scale and distribution unlike thread-based concurrency models. Using POSIX-threads would go outside of this model, and so is not encouraged, as those threads would not be managed by HadeanOS scheduling, and would not benefit from any HadeanOS boon.
Security
Some businesses equate proximity with security which deters them from cloud usage. Are you able to explain any benefits of using Hadean in this regard?
Additionally, developers on private clouds may suffer from the lack of convenience services from major cloud providers (e.g. AWS SQS) – as we build out the Hadean platform, there’s an opportunity for mitigation by providing key libraries and applications as an ecosystem on top of our platform.