Since our last post we’ve continued to work on our public Hadean Platform SDK – in particular, bringing features that we use internally to the wider world. This includes a few API changes (particularly around more flexible ways of spawning new Hadean processes) as well as adding a number of APIs around important operational concerns like metrics, tracing and debugging. We’re excited to share these with you when they’re ready, and we’ll probably have some blog posts in time that go into more detail on these features and how they may be a little different to what you’re used to.
Aside from this ‘feature parity’ work, we’ve also been looking at how the Platform scales under fairly demanding large scale simulations on Aether. We’re mostly still in the benchmarking phase in terms of making platform changes, but we wanted to share that we’ve been doing a bit of work on our local channels implementation.
For context – a lot of communication within Hadean applications happens between processes scheduled to run on the same machine, particularly when using larger instance sizes. Happily, transferring data over localhost is (usually!) a fair amount faster than pushing data to another machine…but we wanted to make sure we’re not leaving any performance on the table, since communication is almost always the limiting factor in distributed applications.
In 2019, Patrick Gordon came across this repository that showed us that TCP over localhost is seriously slow! To be honest, I was pretty surprised that pipes and Unix domain sockets really aren’t that much of an improvement compared to what you can achieve via other means. Over 2021, James Kay and Robert Bartlensky delivered a local channels implementation via shared memory in the platform. The large scale tests have been helping us put local channels through their paces and we’ve discovered a couple of issues that we’re in the process of ironing out – so they won’t be in the next release of the Hadean Platform SDK. We’ll keep you updated!