Zooming in on Observability

Zoom has been under scrutiny lately, for a lot of good reasons. Since their product has quickly escalated to being a public critical infrastructure, we decided to play around with our observability stack to see how the Zoom Linux client actually works, and how the different pieces help us with analysis. This write-up is about using eBPF for research and blackbox testing, and provides hands-on examples using ingraind and RedBPF. Our intent was to see how far we can push eBPF in this domain, while also demonstrating some of the amazing engineering behind Zoom.

ingraind and RedBPF

At Red Sift, we developed ingraind, our security observability agent for our cloud, based on Rust and eBPF using RedBPF. This means we can gather run-time information about arbitrary system processes and containers to validate if they are doing anything nefarious in our cloud, who do they talk to, and which files they access. However, the cloud is just a fancy word for someone else’s computer, and ingraind runs perfectly fine on my laptop, too.

So we fired up ingraind to monitor our daily Zoom “standup” meeting, and decided to analyse the results in this blog post. We then deviated a little, and ended up writing some Rust code that helps us decrypt TLS traffic by instrumenting the binary using eBPF uprobes.

Let’s dig in, first look at the binary, then the results. At the end of the post, I share the configuration file that I used.

First look

To get a basic idea of what to expect, I looked at the zoom binary using strings, and quickly found some pinned certificates for *.zoomgov.com, *.zoom.us, and mentions of xmpp.zoomgov.com. This is great!

The binary is stripped, however, some debug symbols are there for the dependency libraries, not the proprietary code. The directory also contains the Qt library, and a few other standard open source bits and pieces.

Interestingly, the package comes with a wrapper shell script zoom.sh that manages core dumps.

Network traffic analysis

The most interesting thing I wanted to see was how Zoom handles network traffic. I expected at least a partially peer to peer setup with many participants, instead I found that only a handful of IP ranges were used during any call. Most interestingly, one call about two weeks ago showed some amount of TCP traffic hitting a broadcast IP address that ends with .255. My recent tests showed a larger number of individual IPs within a few different IP blocks, but the client version was the same.

This is how I found out that Zoom actually operates their own ISP, which strikes as an ingenious way of managing the amount of traffic they have to deal with.

Zoom reaches out to Google’s CDN to fetch the images for the user profiles who are logged in with Google’s SSO. I suspect the same is true for participants who use Facebook logins, but I didn’t see any activity towards Facebook’s ranges.

Another thing that’s interesting to see is that there is a dedicated thread for networking. I can only hope that this means there is at least some sort of privilege separation, but I did not do syscall analysis this time.

File access patterns

The list of files Zoom touched using my call was nothing surprising. Apart from the usual X libraries, and the dependencies they ship, they maintain a local configuration file, and access PulseAudio-related resources on the file system. Stripping out the boring bits leaves us with this:

$ rg '"process_str":"zoom.*"' zoom_file.log |jq '.tags.path_str' |sort |uniq

...
"etc/ca-certificates/extracted/tls-ca-bundle.pem"
...
"p2501/.config/zoomus.conf"
"p2501/.local/share/fonts/.uuid"
"p2501/.Xauthority"
"p2501/.zoom/data/conf_avatar_045a3a19053421428ff"
"p2501/.zoom/data/conf_avatar_763f5840c57564bca16"
"p2501/.zoom/data/conf_avatar_7cdb80036953ea86e83"
"p2501/.zoom/data/conf_avatar_9426e77c9128d50079d"
"p2501/.zoom/data/conf_avatar_aa2a71a3e0a424e451b"
"p2501/.zoom/data/conf_avatar_b46ff5ad22374cdd56d"
"p2501/.zoom/data/conf_avatar_b61879be31e3fce14ee"
"p2501/.zoom/data/conf_avatar_c29cc2bde8058cb0093"
"p2501/.zoom/data/conf_avatar_e3ef3d0218f29d518dd"
"p2501/.zoom/data/conf_avatar_e8dc9d76cae1c2a5f3e"
"p2501/.zoom/data/conf_avatar_ef7b17310c83f908b39"
"p2501/.zoom/data/conf_avatar_f26b44b634fceb21d7e"
"p2501/.zoom/data/conf_avatar_f28df485f9132b47c75"
"p2501/.zoom/data/zoommeeting.db"
"p2501/.zoom/data/zoomus.db"
"p2501/.zoom/data/zoomus.tmp.db"
...

The local cache of avatars is certainly a good call. As we’ve seen above, Zoom downloads the avatars from Google if the user is logged in through the single sign-on service, and it makes sense to maintain a local cache of these. The files are PNGs and JPEGs, and I have found pictures of people I do not recognise, so it looks like there’s no automatic cleanup.

The local databases are more interesting. They are SQLite databases that seem to contain not much information at all. There is no local copy of conversations or chats.

However, the access to the TLS CA bundle is a bit baffling given the pinning certificates in the binary, but I suspect one of the linked libraries might be auto-loading this store.

Accessing the unencrypted data

For this, we had to bring out the big guns, uprobes, so I’ll hand it over to Alessandro.

As discovered by Peter looking at the network connections logged by ingraind, Zoom uses Transport Layer Security (TLS) to secure some of its connections.

There are several libraries that can be used by applications to implement TLS, including the popular OpenSSL, LibreSSL, BoringSSL, NSS etc. Having recently implemented uprobes support for RedBPF, I thought it could be fun to try and use it to hook into whatever library Zoom uses to implement TLS and intercept the unencrypted data.

Uprobes allow you to instrument user space code by attaching to arbitrary addresses inside a running program. I’m planning to talk about uprobes in a separate post, for the moment it suffices to know that the API to attach custom code to a running program is the following:

pub fn attach_uprobe(
    &mut self,
    fn_name: Option<&str>,
    offset: u64,
    target: &str,
    pid: Option<pid_t>,
) -> Result<()>;

attach_uprobe() parses the target binary or library, finds the function fn_name, and injects the BPF code at its address. If offset is not zero, its value is added to the address of fn_name. If fn_name is None, offset is interpreted as an offset from the start of the target’s .text section. Finally if a pid is given, the custom code will only run for the target loaded by the program with the given pid.

While doing this is certainly possible on a running process, it is a bit of a rabbit hole. Instrumenting Zoom to access decrypted data turned out to be a challenge, but ad-hoc attaching to existing processes within your control should be easily possible using this infrastructure.

Findings

Based on the data we’ve been able to collect with ingraind, there are no screaming issues we’ve found. It is certainly good to see the Zoom app uses pinned certificates, and that they do not keep logs of the messaging history, even as a side-effect.

Using a Qt-based app is a great way to balance performance and security for a cross-platform audience. On top of that, it was really interesting to see how the infrastructure works in action, with connections going to over 100 target IPs in a few blocks, during a single call, being routed through Zoom’s ISP.

I highlighted that they keep a cache of profile pictures from Google accounts. This seems futile as Zoom hits Google’s CDN whenever somebody with a Google account joins.

Test setup

To make sure we were looking at the right traffic and only picked up on Zoom’s DNS queries, I made sure that nothing else was running during the call but the Zoom Linux client. We used the following ingraind config to monitor the system during a Zoom call.

[[probe]]
pipelines = ["console"]
[probe.config]
type = "Network"

[[probe]]
pipelines = ["console"]
[probe.config]
type = "DNS"
interface = "wlp61s0"

[[probe]]
pipelines = ["console"]
[probe.config]
type = "Files"
monitor_dirs = ["/usr/bin"]

[[probe]]
pipelines = ["console"]
[probe.config]
type = "TLS"
interface = "wlp61s0"

[[pipeline.console.steps]]
type = "Container"

[[pipeline.console.steps]]
type = "Buffer"
interval_s = 1
enable_histograms = false

[pipeline.console.config]
backend = "Console"

Using this config, we can redirect all the output into a file and use command line tools like ripgrep, and jq to process the results for a superficial look. Let’s take a look at the config file piece by piece.

[probe.config]
type = "Network"

The Network probe enable network traffic analysis. This means we get low level information on every read and write on a network socket, whether TCP, UDP, v4 or v6.

[probe.config]
type = "DNS"
interface = "wlp61s0"

The DNS probe collects incoming DNS traffic data. Incoming DNS traffic also includes DNS answers, so we will know exactly what our queries are and what they resolve to, even if an application chooses to craft the packets themselves and bypass the gethostbyname(3) libc call. Due to an implementation details, we need to specify the network interface, which, using systemd, looks a bit ugly.

[probe.config]
type = "Files"
monitor_dirs = ["/"]

The file probe gives us information about file read/write events that happen in a directory. For monitoring Zoom, I decided I want to see all filesystem activity, so anything that happens under / will show up in my logs. Most applications tend to show fairly conservative access patterns after the loader caters for all the dynamic dependencies, and this is what I expect here, too.

[probe.config]
type = "TLS"
interface = "wlp61s0"

I want to see the details of TLS connections. Since we know Zoom uses certificate pinning, it will be interesting to see the properties and cipher suits the connections actually use.

[[pipeline.console.steps]]
type = "Container"

This is a long shot, but we would be able to pick up cgroup information using the Container post-processor. I didn’t pick up any cgroup information, though, so there’s no network-facing sandbox.

[[pipeline.console.steps]]
type = "Buffer"
interval_s = 1
enable_histograms = false

     
[pipeline.console.config]
backend = "Console"

And finally, aggregate events by every second to reduce the amount of data we have to analyse, then print it to the console.

Enabling aggregation is a good idea if you want to load the results into your preferred analytics stack as I did, because a raw event dump get very large very quickly.

Conclusion

This was a quick-ish and fun exercise to see how far we can push our tools in security research. It’s great seeing the amount of effort the community puts into securing critical infrastructure, and in these unprecedented times, privacy and security of video conferencing is definitely at the top of the list as companies are still figuring out how to transition to more sustainable remote environments.

More importantly, it shows that programs are just programs, it doesn’t matter whether we call them containers or desktop applications, the same methodology applies to monitoring them.

A large benefit of deploying an observability layer that doesn’t require opt-in from the applications is the immediate increase in coverage, whether it’s for tracing, or security-related work. The layers of data can be aggregated using ingraind’s powerful tagging system, which allows in-depth analysis across the different abstraction layers that make up the environment.

If you’d like to find out more information about ingraind, visit our information page below. Happy hacking in your sandboxes!

PUBLISHED BY

Peter Parkanyi

1 May. 2020

SHARE ARTICLE:

Recent Posts

VIEW ALL
Cybersecurity

The role of DMARC in email security 

Red Sift

We’ll admit it, we’re pretty nerdy for email security and are passionate about ensuring your organization is protected from harmful cyber attacks and bad actors. You’ll often hear us talk about Domain-based Message Authentication, Reporting and Compliance (DMARC) because…it’s kind of a big deal. Yet, as Antony Seedhouse highlighted at the recent e-Crimes &…

Read more
DMARC

Mail Check: Navigating the new changes

Jack Lilley

The National Cyber Security Centre (NCSC) recently proposed updates to its Mail Check coming into effect on 24 March 2025. As the service evolves to focus on accessibility and scalability, some of the features that UK public sector organisations relied on will no longer be available, including DMARC aggregate reporting. To help make sense…

Read more
Cybersecurity

Exploring the complexities of cyber insurance with Harpreet Mann

Sean Costigan

In the fourth episode of Resilience Rising, Sean Costigan, Managing Director of Resilience Strategy at Red Sift, delves into the intricacies of cyber insurance with Harpreet Mann, President of Amynta Trade Credit and Political Risk Solutions. Drawing on her extensive experience in insurance and risk management, Harpreet sheds light on the challenges and transformative…

Read more
DORA

Countdown to compliance: Are you ready for the DORA deadline?

Jack Lilley

The European Union’s (EU) Digital Operational Resilience Act (DORA) deadline approaches, with just one week to go before the DORA applies to all financial entities and their ICT service providers on January 17 2025. Sectors affected by the DORA include but are not limited to: Understanding and ensuring compliance with the upcoming legislation need…

Read more