Novel usage of BPF on the Hadean platform
The Berkeley Packet Filter (BPF) is an emerging Linux technology that originally existed to analyze network traffic in Linux. It is a safe and fast mechanism to insert bits of code to run in the kernel or other processes at runtime and to enable new kinds of tooling. It is widely used by well known companies such as Netflix, Google, and Facebook on enormous numbers of production machines but has never been implemented yet on simulation software.
The problem with large scale processes
For any large scale, distributed application, there’ll be hundreds and thousands of processes and connections between them. Working at this scale presents quite a challenge and it is difficult for developers and almost impossible for non technical users to fix performance problems in these programs. This is relevant to Hadean because our platform makes it easy to write large scale, distributed applications. The application of BPF on the Hadean platform represents a breakthrough innovation that will be crucial to fix distributed performance problems such as the tick spikes in Aether Engine.
There are some standard Linux tricks and tools that may work well on one process or one machine, but they can struggle or just be difficult to use at huge scale. A good example is perf, a sampling profiler for Linux which can track a variety of events. It can see in detail what is using up the most time on the processor, either by looking at the function call stack or individual instructions. Brendan Gregg, a senior kernel and performance engineer at Netflix, created a tool to help visualise the output from perf called FlameGraphs, and there are a variety of spin-offs such as FlameScope and SpeedScope. Each of these tools only present the standard output from perf in nicer ways, they don’t expand on what is possible beyond the sampling profiler. These types of tools are extremely useful if the user’s issue appears in:
- The time-average, in which case plain FlameGraphs help.
- At regular time intervals, in which case FlameScope can help.
- In one-off events, in which case SpeedScope can help.
The limitation of perf as a sampling profiler is that it takes a sample every number of CPU cycles of what the processor is doing. This also means that application-level information can’t be gleaned from perf samples, only the stacktrace and registers.
The alternative is a tracing profiler, which instruments your executable (either statically at compile time, or dynamically at run time), and outputs every event. Some tracers add the ability to discard or only store a fraction of the traces to help mitigate the potentially large size to store this data. One of the main drawbacks of tracing profilers is that tracing requires some hypothesis about what the problem might be and this is not always available.
How is Hadean using BPF?
The solution to overcome the limitations of existing sampling and tracing profilers is a BPF-based performance profiling tool. This tool is unique and innovative because it combines sampling and tracing, and it is fully dynamic thanks to the integration of BPF and DWARF. DWARF is a widely used and standardised debugging data format and provides a mechanism to identify the right positions to insert bits of code. An example of this working in practice was taking stacktrace samples and combining them with application-level information (tick value in this case) with dynamically inserted instrumentation in user code.
The combination of BPF and DWARF can be used to create a performance investigation tool that can fix performance issues on both Aether and other Hadean programs and will give confidence to technical customers. They will be able to solve their own technical problems. Thanks to BPF it becomes possible to go beyond the limits of perf. It is possible indeed to reimplement perf, with lower overhead, and extra capability. Since Linux 4.1, BPF can be invoked on the same events that perf uses.
Hadean is the first company to apply BPF in the simulation domain and its implementation could represent a huge operating system change.