34C3 Tool Release: Cachegrab
Today, NCC Group is releasing Cachegrab, a tool designed to help perform and visualize trace-driven cache attacks against software in the secure world of TrustZone-enabled ARMv8 cores. These cache attacks, as well as other microarchitectural attacks on secure computing environments, were presented at the 34th Chaos Communication Congress.
There are two key properties of many TrustZone implementations that make the attacks within Cachegrab feasible. First, the secure world and non-secure world often share the caches within a processor. This means that when software executes in the secure world, it affects the presence or absence of non-secure world entries within the shared cache. Second, privileged users in the non-secure world are able to use privileged instructions to interleave attacker and victim processes, as well as determine what non-secure data has been evicted from the cache.
The result is a utility that provides a high degree of visibility into the behavior of software in the secure world. By targeting the L1I and L1D caches, Cachegrab is able to distinguish accesses to secure world code and data with 64-byte granularity, giving a detailed picture of what the victim process in the secure world is doing over the course of a single execution. Although such cache attacks do not directly disclose the contents of victim memory, the pattern of cache usage is often enough to infer the secret values the victim is meant to protect. In addition to the L1I and L1D caches, Cachegrab also implements an attack against the Branch Target Buffer (BTB), a cache-like structure used by the processor to help predict the outcome of conditional branches. Although the details will vary per microarchitecture, in the Cortex-A57 this translated to detecting secure world branch execution with a granularity of 16 bytes, quadrupling the spatial resolution of existing control flow attacks. Due to the attacker/victim interleaving, all cache attacks have high temporal resolution as well, and by using a core’s own performance monitor counters, the attacks are almost completely noise free.
Cachegrab reinforces the necessity of being extremely careful when writing code that runs in the secure world. If the software exhibits any sort of secret-dependent memory access or control flow, it is potentially susceptible to these attacks. The same level of rigor used when writing side channel resistant cryptographic code must then be applied to all secure world code that handles sensitive information. Microarchitectural attacks on TrustZone are simple, powerful, and, with the release of Cachegrab, easier to realize than ever.
How It Works
In order to be as portable as possible, Cachegrab makes some assumptions about the environment it is run in:
It assumes the rich operating system in the non-secure world is based on Linux.
It assumes the attacker can load arbitrary kernel modules. The limitation on the rich OS is not that restrictive, as this is extremely common, and other attacks focus on escalating from no privileges to kernel code execution. As such, it makes sense to assume the attacker has full control of the non-secure world.
The attacker must have some way of invoking the victim code in the secure world. In practice, this may look like calling into a shared Linux library which uses a device driver to make SMC calls to TrustZone.
In order to use some of the higher-level features, the device should have a network connection accessible to the attacker. These assumptions are often true, and greatly simplify the amount of customization needed to port Cachegrab to a new device.
Cachegrab consists of three components: the kernel module, the server, and the client.
Cachegrab Kernel Module
The Cachegrab kernel module is responsible for directly performing the measurements in the cache attacks on the Trusted Execution Environment. It handles all the attack functionality that must be run with privileges at exception level 1 (EL1), including configuring and reading performance counters and sending symmetric multiprocessing (SMP) calls from one core to another.
The module is designed to be analogous to an oscilloscope. It is capable of collecting trace data on multiple cache structures per execution, similar to the probes on an an oscilloscope being connected to multiple contacts on a target board. There are three different probes within Cachegrab, splitting the responsibility for the microarchitectural attacks on the L1 Data (L1D) cache, L1 Instruction (L1I) cache, and the branch target buffer (BTB). The process of “attaching” a probe to a target core involves specifying the shape of the cache and results in the allocation of any memory or performance counters needed for the attack. After the probe has been attached, it can undergo limited configuration such as limiting the range of addresses targeted by the attack.
Each probe contains its own measurement function that will be called during the attack. The L1D, L1I, and BTB attacks are all “Prime+Probe” style. This involves “priming” the cache by filling it with attacker data, and later “probing” the cache by seeing what attacker data has been evicted by the victim code.
In the L1D attack, the cache is primed by reading several addresses within the kernel’s
_text section. This is done because those addresses are likely to be both physically and virtually contiguous, making it easier to know how to fill the cache completely. Once the cache has been primed and the victim has executed some code, the attacker probes the cache by alternating between reading the same addresses and the L1D cache refill performance counter. If the victim code caused attacker data to be evicted, the performance counter will increment as the attacker data is loaded back into the cache. The priming and probing repeats until the victim code finishes executing or a maximum number of samples has been collected.
The L1I attack is similar. To begin, the L1I cache is filled by executing a function the size of the L1I cache. The function itself reads the L1I cache refill performance counter at intervals of the cache line size. If a particular part of the function was not previously in the cache, it will be detected by the next performance counter read. The same function is called during the probing step to recover the L1I entries evicted by the victim process.
Finally, in the BTB attack, the module allocates another large function used for both priming and probing. This function consists of several conditional jumps to the following instruction and branch misprediction performance counter reads. During the priming step, the branches are always taken and an entry is placed within the BTB to indicate that future executions of that branch will be taken as well. During the probing step, if that BTB entry was evicted by the victim, the core will predict that the branch will not be taken, causing a branch misprediction. Again, the priming and probing repeats until the victim finishes executing on the target core.
To execute the attacker’s priming and probing functions on the same core as the victim code, Cachegrab uses symmetric multiprocessing (SMP) functions. A secondary “scope core” runs in a loop as the victim runs on the target core. The scope core issues the SMP instruction which triggers an interrupt on the target core. The secure world monitor passes this interrupt to Linux, which then probes and primes the caches on the target core before returning control to the victim. The scope core collects the results, gives the victim process a chance to continue executing, then repeats the process.
Using these methods, the Cachegrab kernel module is capable of achieving an easily configured, high temporal precision, and low noise microarchitectural attack against the L1D, L1I, and BTB caches.
Not all of the attack operations need to happen in the kernel, so these happen in a userland binary called the server. The Cachegrab server is responsible for invoking the victim code, scheduling the needed code on the target and scope cores, and providing a simple HTTP server the attacker can use to interface with the module and control the full attack workflow.
Scheduling the attacker and victim code is simple on Linux. When a sample capture begins, the server creates two new threads, and uses
sched_setaffinity to lock each to a particular core. The priority of the thread is also increased to reduce the chance of the attacker or victim code being interrupted by an unrelated process.
The server also contains logic which isolates the measurement to the parts of the victim code running within the secure world. This is done by using a shared library to hook calls from the victim process to the TrustZone driver. The shared library shim can then start the measurement before calling the original function, and stop once it has completed. This reduces the number of measurements the attacker needs to collect, aligns the collected traces to a common reference point, and even allows the attacker to only collect calls to the TrustZone driver made with certain parameters.
Apart from these features, the Cachegrab server is relatively simple and functions as a proxy between the attacker and the kernel module.
The final component of Cachegrab is the client and associated GUI. On top of offering a simple way to interact with the Cachegrab server, the client provides functionality to ease sample management, processing, and analysis. It is written in Python, and it can be used in GUI form or as a simple Python module. The client is written with customizability in mind, so that no matter the side channel being attacked, it is easy to add new analysis code to go from raw samples to the extracted secret.
Cachegrab aims to be straightforward and flexible, allowing for the rapid development of cache attacks against flawed software implementations. By making it easier to perform these attacks, our goal is to demonstrate the risk these attacks pose, promote safer coding practices for secure world code, and provide the tools to evaluate susceptibility of sensitive information to the complex dangers of microarchitectural attacks.
Cachegrab is open source and licensed under the GPLv2 license. Check it out on NCC Group’s GitHub account.
Published date:  27 December 2017
Written by:  Keegan Ryan