The Hadean Platform SDK – What The Heck Did We Just Make?

Summary

After the release of the free trial of the Hadean Platform SDK, Alexei takes us through what we just made and how it works.

Studio
9 min read

Warning: Trying to access array offset on value of type null in /nas/content/live/hadean2022/wp-content/themes/blankslate/functions.php on line 373

After the launch of the free trial SDK for the Rust conference, I wanted to share a more technical view of what we have created over the last few months and how the SDK builds on top of the existing Hadean platform. I’m really excited about the SDK, but more than that I’m really excited about the direction we’ve gone in creating it. I wanted to share that excitement and this also acts as a good opportunity to spread some knowledge about the SDK.

Without further ado; on with the vague architectural diagrams!

High Level Architecture

Our SDK is a bag of components that make building and deploying Hadean based applications easier. Technically, the only components you really need are the launcher, the backend, the SDK crates, the portal, and auth0. However, without everything else it would be a lot harder to use.

Let’s run through the purpose of each component.

The CLI

We’ve built a new CLI, the Hadean CLI. This CLI is the focus of our documentation and it’s the way we expect people to run and deploy applications built with the SDK. It has some neat features like running terraform automatically, but mostly it’s just a wrapper around our existing and new HTTP APIs.

SDK Crates

The SDK has a crates directory, which contains the crates you need to be able to build hadean applications. We’ve built these crates in such a way that hides our intellectual property, but they are 100% Rust based under the hood. Linking against the hadean crate gives you access to our API, allowing you to scale your application as easily as you would run new threads.

The Gateway

When you deploy your application to the cloud, a remote cluster is created. For the CLI to communicate with the cluster, we need a server on a machine that can control the cluster. To make our interface as open as possible, we’ve built this as an HTTP / REST based interface, defined via Open API and using auth0 for authentication. 

Like the rest of the SDK then, the Gateway is 100% Rust. It uses actix-web to provide the REST interface and it’s behind an nginx reverse proxy. The cool thing here is that the API is entirely open and secure. In theory you could throw away the CLI entirely and use Postman or curl to deploy and run your apps. Doing this made the SDK easier to build and componentized, but it also means that the system is more hackable for the experienced user who needs that.

Hadean Portal

We use the portal for creating and finding platform-clusters and personal-accounts. Right now, only users with a personal-account can download and use the SDK. Every time a remote deployment is created, we create a platform-cluster. For users who are familiar with Hadean Simulate, platform-clusters are separate from Hadean Simulate clusters, which is visible in the REST API object model.

Restricted Backend

For the free version available to everyone, we’ve restricted the resources our full backend allows access to:

1. You cannot specify the machine type, only 8 core azure VMs are allowed right now.

2. You cannot use the spawn command to create processes in any more than 4 VMs.

3. You cannot use any other cloud than Azure, so AWS and GCP are not available.

It follows then that you are limited to 32 cores with the restricted backend. If you were to move an application created with the free version of the SDK to one that has the full backend, everything else would stay the same. This gives you an opportunity to see scaling at work and then buy into it if you want to go big or use other clouds! Our platform backend is fundamentally the same software behind Hadean Simulate and our other products.

The Launcher

The launcher is what takes your application and runs it in the right environment for a Hadean application. It’s used both to run local applications when testing (using the demo backend) and it’s also used to run on remote clusters (via the restricted backend). The CLI uses the launcher for local runs, and the Gateway uses it for remote runs.

You could run the launcher directly (it’s just installed along with the rest of the SDK). The CLI just presents a simpler interface for using it. Between this fact and the open HTTP endpoints, this is why you could actually remove the CLI from the SDK and still do everything you need to do.

The SDK in Practice

So what? We’ve got all these components, but why? And how does this make my life easier? Ironically, the easiest way to answer this question is to just go use the SDK! If you haven’t yet then you can sign up to use it, and we have documentation available to get you started!

For those who haven’t tried it out yet, here’s a rundown of using the SDK. Starting with downloading and installing the SDK from a single line from the downloads page. By the way these links expire in 10m, so refresh the page if the download fails.

And it’s installed in just a few seconds. Once installed, I source my profile and login.

Now I can build an example:

Once built, running it locally via the SDK is cake:

To run remotely, I must create a cluster, wait for it to provision, and then deploy my application to the cluster:

This time around deployment and provisioning took just 6 minutes. This is typical for Azure (basically we’re waiting for Virtual Network and Virtual Machine resources to deploy) and other providers can be much faster. Once deployed, I can run my application, passing a config file that specifies how to scale my application:

While running, logs from the remote application (currently only on the first machine) are streamed back to you via the CLI. Initially, the dynamic backend must create a new machine to run the application within, so there is a delay while scaling up. We’ve put in a “standby_machines” configuration option so that you can keep a machine around for a given timeframe to make successive runs faster.

And we’ve run remotely! Because the dynamic backend is used and because you have access to the spawn interface, you can scale your application as needed within your code. The first run here took just over 5 minutes, and a successive run I triggered after this took 20 seconds! (Owing to the reuse of the machine we just mentioned).

Provisioning

OK, we’ve just seen how the SDK runs, but we had a long wait half way through the remote run for the machine to provision. What is that, and how does it work? To use the hadean platform on the machine, we need the hadean platform to actually be installed onto the machine in the first place. We create a provisioning bundle to make that happen. Here’s the series of events that take place when a cluster is created:

1. In the CLI, we call POST /personal-accounts/token which creates a personal access token specific to your personal account. This token has both limited access as well as a limited lifetime.

2. We also call POST /platform-clusters to create the cluster resource itself

3. The CLI uses terraform to deploy the resources into Azure including a custom data script with the cluster key and personal access token templated into it.

4. The CLI calls PUT /platform-clusters to set the gateway URL of the cluster.

5. The VM runs cloud-init on startup, which runs our script within custom data, which attempts to download the provisioning bundle using a call to POST /platform-clusters/{key}/provision. To authenticate this request, the personal token is used from the CLI.

6. The downloaded applications are installed onto the VM.

7. Scripts are run that install all the other software we need and configure everything.

The public APIs and authentication is used at every step. Just like our other processes. Open APIs and security are two of the ideals that make me, as a developer, excited by this platform.

In Summary

The SDK is something that I really enjoyed building at Hadean. The Hadean platform has a massive potential. Now that it’s available and easy to use I hope that people will start discovering that potential. The platform is tech that we’ve been developing and using internally for a while now, including using it to build our other products. This release changes the game by giving everyone access to the in code scaling ability that the platform provides (as well as several of its other features).

I stressed a couple of times that the CLI is effectively redundant, just something that makes our APIs easier to use. This really suits the way that I personally like to develop applications. This kind of design means that if you decide to automate heavily on top of the platform, you won’t have to go through the CLI for every operation; something that otherwise could become a bottleneck. Instead, you can build over the API and get all the same functionality while optimizing the use of the API for your own purposes. This is an ethos you can see in all the major cloud providers as well and there isn’t any coincidence in that.

Anyway I hope everyone reading this who’s interested checks out the SDK and starts thinking about what kinds of applications they can build with this kind of scale!