If you do system programming you’ve probably heard BPF mentioned a lot lately. It’s a hot new Linux technology that allows running user supplied programs in the kernel. It’s being used by Netflix, Facebook, Google, Cloudflare and a host of other companies to implement things like blazing fast load balancing, DDoS mitigation and performance monitoring.
In the past few months I’ve been working with Red Sift on RedBPF, a BPF toolkit for Rust. Red Sift uses RedBPF to power the security monitoring agent InGRAINd. Peter recently blogged about RedBPF and InGRAINd, and ran a workshop at RustFest Barcelona. We’ve continued to improve RedBPF since, fixing bugs, improving and adding new APIs, adding support for Google Kubernetes Engine kernels and more. We’ve also completed the relicensing of the project to Apache2/MIT – the licensing scheme used by many of the most prominent crates in the Rust ecosystem – which will hopefully make it even easier to adopt RedBPF.
In this post I’m going to go into some details into what RedBPF is, what its main components are, and what the full process of writing a BPF program looks like.
You don’t really need to be an BPF expert to read this post, as in the next section I’m going to give a quick, super high level overview of the main concepts you need to know to understand the rest. If you do want to dig deeper, Brendan Gregg’s book BPF Performance Tools just came out and it’s rather excellent. And if you’re not much of a book person, I’m also going to provide some other useful links at the end.
The quickest BPF crash course
BPF is a virtual machine that allows running user defined programs in the kernel when certain events happen on a Linux system. Say for example you want to monitor suspicious file activity, log network response latency or even trace user space apps – you can write small BPF programs, request that they get attached to the right place in the kernel, and implement the necessary instrumentation.
The BPF VM uses its own instruction set. You can write the bytecode directly, but people typically use bpftrace or write C code and compile it with the BPF Compiler Collection (BCC).
bpftrace is an amazing tool that lets you write BPF programs using an ad-hoc, high level language. It is excellent for short scripts and manual instrumentation, whereas BCC is better suited for more complex tools, or when integrating with other applications and systems.
BCC leverages LLVM’s BPF target support. It lets you write C code that is then compiled with clang to BPF bytecode that can executed by the BPF VM in the kernel. There are some restrictions on the kind of C code you can write – most notably you can’t use loops – but otherwise writing a BPF program in C doesn’t feel too different from targeting any other (somewhat quirky) embedded platform.
Because BPF programs are executed in the Linux kernel, BPF bytecode can’t be just executed as-is but needs to be loaded into the kernel. Most applications that use BPF are therefore split into two parts: the BPF code running in the kernel, and a user space process that is in charge of loading the code in the kernel and interacting with it.
Schematically, the process of developing a BPF program can be summarized with the following steps:
- Write the BPF code in C
- Compile the code for the BPF VM
- Write a user space component that loads the output of step 2. into the BPF VM
- Use the BPF API to exchange data between the user space component and the BPF code
RedBPF includes APIs and tools to implement all the steps above except for step 1). With RedBPF, step 1. becomes:
- Write the BPF code in Rust
I’ve glossed over many details and oversimplified some things, but if you didn’t know anything about BPF you should now understand enough to follow along. Next I’m going to show how exactly RedBPF can be used to implement the steps above.
So what is RedBPF?
RedBPF is a collection of rust crates. It includes:
- redbpf-macros and redbpf-probes: they provide the kernel space BPF API (step 1)
- redbpf: provides the user space BPF API. Notably it provides the API to load the BPF bytecode (step 3)
- cargo-bpf: a cargo plugin that simplifies creating, building and debugging BPF programs (step 2)
A simple HTTP tracer (kernel side)
I’m now going to show a very simple BPF program written in Rust using RedBPF, which can be run in the kernel. It traces all inbound HTTP requests on a given network interface using the eXpress Data Path (XDP) APIs. XDP programs hook directly into the NIC driver (but can fallback to running at a higher level), providing fast, low overhead access to inbound data before it enters the rest of the networking stack.
#![no_std] #![no_main] use redbpf_macros::{map, program, xdp}; use redbpf_probes::bindings::*; use redbpf_probes::xdp::{PerfMap, Transport, XdpAction, XdpContext, MapData}; use bpf_examples::trace_http::RequestInfo; program!(0xFFFFFFFE, "GPL"); #[map("requests")] static mut requests: PerfMap<RequestInfo> = PerfMap::with_max_entries(1024); #[xdp] pub extern "C" fn trace_http(ctx: XdpContext) -> XdpAction { let (ip, transport, data) = match (ctx.ip(), ctx.transport(), ctx.data()) { (Some(ip), Some(t @ Transport::TCP(_)), Some(data)) => (unsafe { *ip }, t, data), _ => return XdpAction::Pass, }; let buff: [u8; 6] = match data.read() { Some(b) => b, None => return XdpAction::Pass, }; if &buff[..4] != b"GET " && &buff[..4] != b"HEAD" && &buff[..4] != b"PUT " && &buff[..4] != b"POST" && &buff[..6] != b"DELETE" { return XdpAction::Pass; } let info = RequestInfo { saddr: ip.saddr, daddr: ip.daddr, sport: transport.source(), dport: transport.dest(), }; unsafe { requests.insert( &ctx, MapData::with_payload(info, data.offset() as u32, data.len() as u32), ) }; XdpAction::Pass }
File: bpf_examples/src/trace_http/main.rs
The first thing to notice is that the program is a #![no_std]
#![no_main]
binary:
#![no_std] #![no_main] ...
The BPF VM doesn’t support many of the features required by the stdlib
and BPF programs are executed in response to designated events so they don’t have a conventional main
entry point. The binaries produced are not executed, but are loaded into the kernel using API provided by redbpf
as we’ll see later.
The next thing to notice is the program!()
macro call:
program!(0xFFFFFFFE, "GPL");
BPF programs need to specify which kernel version they’re compatible with and what license they’re distributed under. 0xFFFFFFFE
is a special value meaning any kernel version. The license needs to be declared because the VM is going to make some APIs available or not depending on it (if the program is not GPL, it won’t be able to use GPL functionality in the kernel). In addition to letting you specify version and license, program!()
also generates some global boilerplate that is needed for the program to compile and load correctly.
trace_http()
is the function that analyzes network data looking for HTTP requests:
#[xdp] pub extern "C" fn trace_http(ctx: XdpContext) -> XdpAction { ... }
As you can see it’s annotated with the #[xdp]
attribute macro, which is part of the redbpf-macros
crate. #[xdp]
does a few things, but as we’ll see later, it’s used mainly to signal to the bytecode loader in redbpf
that the function is an XDP
program.
The function takes an XdpContext
and returns an XdpAction
. XdpContext
provides a higher level abstraction over the underlying xdp_md
pointer provided by the BPF VM to XDP programs. #[xdp]
transparently maps between the two types. The return value – XdpAction
– can be used to indicate what should be done with the data currently being inspected: whether it should be passed down the network stack, dropped, redirected to another interface etc.
The actual parsing logic is pretty simple: if the transport protocol is TCP and the payload looks like an HTTP request, the request is sent to user space where it can be analyzed. The BPF API provides several data structures – called maps – that can be used to store and aggregate data across program invocations and can also be used to exchange data with user space. The map used by our program – a PerfMap
– allows BPF programs to store data in mmap()
ed shared memory accessible by user space.
#[map("requests")] static mut requests: PerfMap<RequestInfo> = PerfMap::with_max_entries(1024); #[xdp] pub extern "C" fn trace_http(ctx: XdpContext) -> XdpAction { ... let info = RequestInfo { saddr: ip.saddr, daddr: ip.daddr, sport: transport.source(), dport: transport.dest(), }; unsafe { requests.insert( &ctx, MapData::with_payload(info, data.offset() as u32, data.len() as u32), ) }; XdpAction::Pass }
The requests
global is our PerfMap
, which as you can see is annotated with the #[map]
attribute. The attribute is used to name the map and to make it so that it is placed in a special ELF
section called maps/<name>
(in our case maps/requests
) of the resulting binary, so that the user space loader can find it and initialize it.
For every HTTP request we create a RequestInfo
struct which holds source address, destination address, source port and destination port of the request. We then insert it in the PerfMap
wrapped in a MapData
value. MapData::with_payload()
is used to indicate to the driver that we want data.len()
bytes from the current packet inserted in the map immediately following the RequestInfo
data. data.offset()
is used to indicate to user space the offset at which the HTTP data starts (following the Ethernet, IP and TCP headers).
Building and debugging the HTTP tracer
If you want to build the code shown in the previous section, and load it as shown in the following section, you can clone the repo including the code above from http://github.com/alessandrod/bpf_examples.
To build the code cd
into the cloned folder and run:
$ cargo install cargo-bpf $ cargo bpf build
If the build succeeds, it will place the compiled BPF program under target/release/bpf-programs/trace_http/trace_http.elf
. To make sure that the program loads and functions correctly, you can use cargo bpf load
(needs to be run as root):
# cargo bpf load -i en0 target/release/bpf-programs/trace_http/trace_http.elf Loaded: trace_http, XDP
Replace en0
with the network interface you want to trace. Then if you run an HTTP server on that interface, and send an HTTP request, you should see something like:
Loaded: trace_http, XDP -- Event: requests -- |3cf2510a ac1f0bed c3fd401f 42000000| <.Q.......@.B... 00000000 |51000000 0293e251 265202ad 2f8f62b2| Q......Q&R../.b. 00000010 |08004500 00850000 40002f06 056b3cf2| ..E.....@./..k<. 00000020 |510aac1f 0bedfdc3 1f40aaf5 7507bbd8| Q........@..u... 00000030 |9a7a8018 0816fa89 00000101 080a82b1| .z.............. 00000040 |1819b4d3 62134745 54202f66 6f6f2048| ....b.GET /foo H 00000050 |5454502f 31000000 00000000| TTP/1....... 00000060 0000006c
For every request, cargo bpf
will output -- Event: [MAP NAME] --
followed by an hex dump of the data sent by the kernel code.
Writing a custom user space loader
cargo bpf load
is pretty handy during development. Once you get the BPF code working tho, you probably want to do something more useful with the data it collects. For our simple example say that for each request, we want to output a line in the format:
1.2.3.4 - GET /foo HTTP/1
Where the left side is the IP address of the client, and the right side is the HTTP request line. Here’s a simple program that uses redbpf::load::Loader
to do that:
use std::env; use std::path::PathBuf; use std::io; use std::net::IpAddr; use futures::stream::StreamExt; use tokio; use tokio::runtime::Runtime; use tokio::signal; use redbpf::load::Loader; use redbpf::XdpFlags; use bpf_examples::trace_http::{MapData, RequestInfo}; #[tokio::main] async fn main() -> Result<(), io::Error> { let args: Vec<String> = env::args().collect(); if args.len() != 3 { eprintln!("usage: bpf_example_program [NETWORK_INTERFACE] [FILENAME]"); return Err(io::Error::new(io::ErrorKind::Other, "invalid arguments")); } let interface = args[1].clone(); let file = args[2].clone(); let mut loader = Loader::new() .xdp(Some(interface), XdpFlags::default()) .load_file(&file.into()) .await .expect("error loading file"); tokio::spawn(async move { while let Some((_, events)) = loader.events.next().await { for event in events { let event = unsafe { &*(event.as_ptr() as *const MapData<RequestInfo>) }; let info = &event.data; let payload = String::from_utf8_lossy(event.payload()); let req_line = payload.split("\r\n").next().unwrap(); let ip = IpAddr::from(info.saddr.to_ne_bytes()); println!("{} - {}", ip, req_line); } } }); signal::ctrl_c().await }
File: bpf_example_loader/src/main.rs
The first notable thing is that the program is async and uses async-await. To load the BPF code, it uses the redbpf::load::Loader
API:
let mut loader = Loader::new() .xdp(Some(interface), XdpFlags::default()) .load_file(&file.into()) .await
Loader
is a high level API that uses the lower level Module
API and exposes events from maps as a unified stream of Vec<Box<[u8]>>
. The Loader
API available when compiling redbpf with the load
cargo feature enabled.
The events are then processed with:
tokio::spawn(async move { while let Some((_, events)) = loader.events.next().await { for event in events { let event = unsafe { &*(event.as_ptr() as *const MapData<RequestInfo>) }; let info = &event.data; let payload = String::from_utf8_lossy(event.payload()); let req_line = payload.split("\r\n").next().unwrap(); let ip = IpAddr::from(info.saddr.to_ne_bytes()); println!("{} - {}", ip, req_line); } } });
A new task is spawned to process the loader.events
stream. Each stream item is a Vec
of byte slices (the events are retrieved in batches, hence a Vec
). Each byte slice is the byte representation of oneMapData<RequestInfo>
we inserted in the map from kernel space. Each slice is cast back to MapData<RequestInfo>
so that the RequestInfo
and payload()
can be extracted. The remaining code then extracts the HTTP request line, converts the IP address into something easier to work with, and prints the request log.
Finally, the very last line is:
signal::ctrl_c().await
XDP programs need to be unloaded, which is something that Loader
does in itsDrop
implementation. We therefore intercept CTRL-C (SIGINT) and let the process exit cleanly.
I’ve uploaded the loader package at https://github.com/alessandrod/bpf_example_loader. If you clone it next to bpf_examples
, build it then run it as root with:
# cargo run -- en0 ../bpf_examples/target/release/bpf-programs/trace_http/trace_http.elf
Replace en0 with your network interface, run some HTTP requests and you should get something like:
Loaded: trace_http, XDP 60.242.81.10 - GET /where-has-the-time-gone? HTTP/1 60.242.81.10 - GET /you-will-be-missed-but-you-are HTTP/1 60.242.81.10 - GET /off-to-do-great-things HTTP/1
Parting notes and links
This post ended up being denser and longer than I expected, so thank you for sticking with me! I was going to show a second kind of program – a kprobe – but I’m assuming you have a life to go back to now. That’s ok, I’ll cover kprobes in another post. In the meantime if you want to reach out to me, feel free to drop me an email or message me on twitter @alessandrod.
If I managed to get you interested in playing with RedBPF, I highly recommend that you familiarize yourself with cargo bpf as it greatly simplifies things. Please keep in mind that RedBPF is evolving at a pretty fast pace so expect some rough edges and as always, bug reports and patches more than welcome!
And now those links I promised at the beginning:
- https://docs.cilium.io/en/latest/bpf/ – if I got you really interested in BPF, this is an absolute must read. Major props to the Cilium guys.
- http://www.brendangregg.com/ebpf.html – I already linked to Brendan’s book but this page points to some more amazing BPF resources.
- https://github.com/iovisor/bcc – BCC includes some amazing tools, feel free to port some to Rust!
- https://www.iovisor.org/technology/xdp – XDP documentation by the IO Visor Project.
- https://www.kernel.org/doc/Documentation/kprobes.txt – official kernel documentation about kprobes.