Oxidised eBPF I: Building a toolchain

This is the first of two posts about ingraind and the Rust BPF library that powers it, RedBPF. In this post I’m going to give you an overview of what ingraind is, and how it led to the development of RedBPF. Then in the next post, I’m going to get a little more technical and show how we compile Rust code to BPF binary code.

In case you missed it, at the end of last year I already blogged about RedBPF, explaining what its main components are and giving a simple example of how it can be used.

Why Ingraind

ingraind is the open-source security monitoring agent developed by RedSift. It comes with pre-built probes to monitor file and network activity, including in-depth analysis of DNS and TLS data, syscalls, process execution and more. It can consume and produce StatsD metrics, and it integrates with osquery. In addition to producing StatsD metrics, it can provide output using its own custom data format via HTTP or on Amazon S3.

The agent can be used to monitor traditional servers, Docker containers or entire Kubernetes clusters. Finally — thanks to RedBPF — it is dead easy to extend ingraind with your own observability modules.

Traditional security monitoring agents use a combination of techniques to detect potentially malicious activity, from periodically scanning the file system, to continuously parsing log files to periodically running commands and checking their output (for example listing the running processes and looking for well known malicious programs).

Ingraind instead uses the Linux BPF API. BPF provides hundreds of observability hook points inside the kernel, giving visibility to pretty much anything that happens on a Linux system. Thanks to BPF, ingraind is very fast, and it has near-zero overhead since it implements a push-based model where processing only happens in response to events produced by BPF code running in the kernel.

Why RedBPF

ingraind deployments can range from a single server to clusters of hundreds or thousands of nodes. High performance, low overhead and ease of deployment have always been primary goals. Therefore when the project started, Peter chose Rust as the programming language to develop the agent with.

At the time, BCC was the de-facto standard toolkit to work with BPF (and in many respects, it still is). BCC includes a clang frontend that allows you to write BPF code in a restricted subset of C, a user-space C API to load and interact with BPF code, along with first- and third-party bindings for virtually every popular language to the user-space API.

Here’s an example of a BPF program written with BCC. This particular example uses the Python user space bindings to load BPF code written in C. When executed, the program compiles the BPF C code on the fly, then loads it. This on the fly compilation is at the same very convenient during development, and a burden during deployment. Having to install llvm, libclang, kernel headers and all the other required dependencies on each node of a potentially large cluster was deemed unacceptable for ingraind.

Therefore instead of developing Rust bindings for BCC, RedSift decided to develop RedBPF, a Rust library and toolchain that would provide a build-load-run workflow that would allow Rust programs to build BPF tools and run them without having to compile anything on target machines.

RedBPF – the early days

In the early days, the API comprised of two main parts: a build API and a loader API, both written in Rust. The build API allowed you to build BPF code written in C, and to save the compiled output as an ELF object file. Then the load API allowed you to load those ELF object files in the kernel.

The layout of the ELF object files produced by RedBPF was inspired by gobpf, and compatibility with gobpf is still maintained today.

In addition to compiling the BPF code, the build API also allowed to generate Rust bindings for C data structures (using bindgen), so that BPF C code and the user space Rust code in ingraind could exchange data through BPF maps and perf events.

The BPF C code for the Files probe looked very different at this stage. If you skim quickly through the code, you’ll see that it’s a mix of regular C, plus some somewhat obscure stuff like that SEC macro (used to specify ELF sections) and strange bpf_probe_read calls.

RedBPF – today

The “Rust with C for the BPF code” version of RedBPF worked pretty well. There were however a couple of pain points.

Generating the Rust bindings for the data to be shared between the C code and the Rust code was cumbersome. The resulting bindings were necessarily not idiomatic, so we often ended up having to convert between the generated bindings and something less painful to work with from Rust.

Most importantly, having to write this kind of BPF code wasn’t fun:

struct path path;
struct inode *inode;
umode_t mode;
int check = 0;

check |= bpf_probe_read(&path, sizeof(path), (void *)&file->f_path);
check |= bpf_probe_read(&inode, sizeof(inode), (void *)&file->f_inode);
check |= bpf_probe_read(&mode, sizeof(mode), (void *)&inode->i_mode);
if (check != 0) {
    return 0;
}

This kind of error checking was error-prone, and having to use bpf_probe_read to access struct fields was annoying. To mitigate this and other quirks of the BPF platform, BCC comes with a custom clang plugin that, among other things, is often (but not always) able to automatically insert bpf_probe_read calls.

The network parsing code wasn’t much better either:

dns = buffer + sizeof(struct ethhdr)
    + sizeof(struct udphdr)
    + (ip->ihl * 4);
if (dns + 12 > data_end) {
    return -5;
}

query->id = *(u16 *) dns;
dns += 2;

if (*((u8*) dns) >> 3 != 0x10) {
    return -4;
}

There’s nothing particularly wrong with the code above, it’s how parsing in C works: you have a buffer and do pointer math as you scan it. Being used to Rust tho, we were never really satisfied with it.

The idea of writing eBPF programs in Rust has been floating around for a while, and we got just the right toolchain as a stepping stone to get there. The question remained: How would idiomatic Rust code for eBPF actually work?

At first we thought we’d create some kind of Rust based DSL, using a number of macros or even writing a custom parser. We took two weeks to experiment with ideas, at the end of which we concluded that yes, we could feasibly rewrite the BPF kernel code from C to Rust.

On September 5 last year I committed the first version of ingraind-probes. At that point the code was pretty rough, but it did show promise. And the problem of generating C to Rust bindings was gone: we could just pass #[repr(C)] structs around.

As we kept iterating on both ingraind and RedBPF, the code got better and better, until we realized that not only we could make it work, but we would eventually end up with perfectly idiomatic Rust code.

Here’s today’s Rust equivalent of the file parsing code above:

let path = file.f_path()?;
let inode = file.f_inode()?;
let mode = inode.i_mode()?;

And here’s the DNS code:

let transport = ctx.transport()?;
let data = ctx.data()?;
// DNS is at least 12 bytes
let header = data.slice(12)?;
if header[2] >> 3 & 0xF != 0u8 {
    return Ok(XdpAction::Pass);
}

A few weeks before we released ingraind 1.0, I removed the last bits of C code from the repository. Everything is written in Rust now, and extending ingraind by writing new BPF components is easier than ever. RedBPF went from being a somewhat crazy idea, to something that we now believe has the potential to become a central component of the whole BPF ecosystem.

Conclusion

Rust was the perfect language to write ingraind with. Earlier versions of the agent used Rust for the user space components, and C for the BPF kernel code. The BPF code was hard to develop and maintain, so we developed a toolchain to write BPF code in Rust. This allowed us to ship ingraind 1.0 containing only idiomatic Rust code that is easy to maintain and extend. In the next post I’ll show you some cool hacks we do to compile the Rust code to BPF binary code.

PUBLISHED BY

alessandro

21 May. 2020

SHARE ARTICLE:

Categories

Recent Posts

VIEW ALL
DMARC

Beyond DMARC: How Red Sift OnDMARC supports comprehensive DNS hygiene

Red Sift

Registrable domains and DNS play a crucial role in establishing online identity and trust, but their importance is often taken for granted. During new service setups, record updates are often overlooked, accumulating outdated entries. As infrastructure teams become increasingly overstretched,  services may be incorrectly shut down without proper cleanup, leaving behind a sprawl of…

Read more
DKIM

First look at DKIM2: The next generation of DKIM

Red Sift

In 2011, the original DomainKeys Identified Mail (DKIM1) standard was published. It outlined a method allowing a domain to sign emails, enabling recipients to verify that the email originated from an entity holding a private key that matches the public key published in the domain’s DNS records. Now in 2024, DKIM is ready for…

Read more
Security

Securing our world: For a safer internet

Jack Lilley

October is Cybersecurity Awareness Month, a time for industries to unite in promoting digital security within today’s complex landscape. Bad actors are leveraging increasingly sophisticated methods—such as email phishing and Business Email Compromise (BEC)—to exploit vulnerabilities, impersonate legitimate contacts, and access sensitive information. CISA Director Jen Easterly advises us to “always think before you…

Read more
Cybersecurity

Boosting email security amid recent Coinbase phishing attempts

Jack Lilley

In recent weeks, there have been reports of sophisticated phishing attacks disguised as official communication from the cryptocurrency platform, Coinbase. These phishing emails closely mimic Coinbase’s branding and language to build recipient trust and prompt clicks on malicious links. The subject lines of these emails generally follow a format: the sender’s address starts with…

Read more