Xen SMEP (and SMAP) bypass
In a previous blog post  I talked about my experience exploiting the SYSRET bug on Xen. I noted that I was able to bypass SMEP, but was leaving the information for a future blog post because I wanted to do some additional research -- I thought the technique I found might be something that also affects Linux, which it does.
While I was unaware of it at the time, the technique was published in mid-2014, and called ret2dir (as in return-to-direct-mapped-memory). While the ret2dir technique is publicly known, what follows is a walkthrough of how Xen’s direct mapped memory can be abused to bypass SMEP and SMAP, and how I used it in my exploit both to bypass SMEP and to simplify the data tunnel between dom0 and domU.
To my knowledge, leveraging ret2dir for exploitation against Xen has not previously been discussed publicly.
SMEP and SMAP overview
SMEP stands for Supervisor Mode Execution Prevention. On modern Intel processors it is a mitigation that, when enabled in the CR4 control register, will prevent a privileged thread of execution (as in code running in ring0) from executing memory in a page table entry that does not have the supervisor bit set.
(For any ARM people, SMEP is analogous to the privileged execute never (PXN) flag.) This prevents the common exploit technique in which an attacker redirects some function pointer in kernel space to point into controlled memory in userland. The page table entry corresponding to the virtual address in userland won’t have the supervisor bit set, and so an exception will be thrown.
SMAP, standing for Supervisor Mode Access Prevention, is effectively the same mitigation, but used to prevent read/write accesses rather than execution accesses, so any code running in supervisor mode (ring0) is unable to access non-privileged memory. (Again, for ARM people, SMAP is analogous to the privileged access never (PAN) flag.) If you’re wondering how SMAP doesn’t break everything, the privileged code is updated so that explicit locations designed to read from and write to userland are surrounded by logic that temporarily disables SMAP.
SMEP and SMAP bypasses
There are two general approaches to SMEP bypasses:
1) Use ROP to write to CR4 and disable SMEP; this has been demonstrated by Positive Research , amongst others.
2) Leak and jump to the address of an RWX buffer in the kernel which points to data you control; this is discussed by j00ru , Dan Rosenberg , Keegan Mcallister , Columbia University , etc.
With regards to the first option, ROP is not really ideal, as you then have to worry about stability of code offsets across versions of the software and. in the case of something like Xen, various different compilers used from different vendors (Citrix, Oracle, Linux distributions, etc.). For ROP to be especially effective, you want to rely on a leak that allows you to do on-the-fly gadget searching so you're not relying on any version information. This approach should be a last resort.
Using an RWX buffer is ideal, and I set out to find a generic bypass on Xen. I found my answer in something that is called direct mapping, which is used by both Xen and Linux (and other operating systems). As I said in the introduction, it turns out this technique was already shown against the Linux kernel in .
It’s also worth noting that the ret2dir trick to bypass SMEP has since been addressed on Linux by making the direct mapped memory non-executable. It can still be used to bypass SMAP, but even this is becoming harder  thanks to mitigations added as a result of the recent row hammering exploit . In the case of Xen, we can leverage its own direct mapped memory to bypass both SMEP and SMAP.
The Xen hypervisor supports SMEP since 2011, meaning that it’s possible that it could be encountered when exploiting the SYSRET bug (XSA-7), or more modern Xen vulnerabilities.
Direct mapped memory primer
For a more complete explanation of direct mapped memory, I recommend reading , specifically section 3.2. To summarise: for performance reasons many kernels (and the Xen hypervisor) leverage a large block of virtual addresses pointing to most or all of physical memory. Note that access to this range of virtual addresses is limited to ring0, and is not exposed to userland. This virtual to physical mapping is sometimes referred to as a one-to-one mapping. Having this range allows the kernel or hypervisor to translate quickly between virtual and physical addresses.
So given any physical frame number (PFN), the PFN can effectively be added to the base address of the direct mapped memory range, and you then have a valid virtual address that can be used by functions designed to handle virtual instead of physical addresses. The opposite also works; the physical frame number of a virtual address can be quickly derived without having to walk the page tables. This shortcut is heavily used by page table code in the Linux kernel because the CR3 control register holding the page table base, and addresses stored in the entries at each page table level, are physical rather than virtual.
So using the direct mapping approach means there will often be more than one valid virtual address, and corresponding page table entries, referencing the same physical page frame. Multiple addresses pointing to the same underlying page is typically called aliasing.
The base address of this direct mapping range is static on Linux and Xen. Even with kernel address space layout randomisation applied on Linux, the base address of the mapping is not affected. On Xen, there is no memory layout randomisation within the hypervisor itself, so it is always static.
The primary takeaway here with regards to exploitation and bypassing SMEP and SMAP is that, because this block of virtual addresses points to almost all of physical memory, there will be valid ring0-only (aka SMEP safe) virtual addresses referencing physical memory controlled by userland processes. As long as the code running outside of ring0 (such as a guest domain) is able to query the underlying physical address corresponding to its own virtually mapped memory, it can compute a valid ring0-only virtual address that contains data it controls.
Xen direct map
Let’s take a slightly closer look at this map on Xen. I found it by looking through the xen/include/asm-x86/config.h file and then investigating how it worked. Below you can see the corresponding entry:
/* * Memory layout: * 0x0000000000000000 - 0x00007fffffffffff [128TB, 2^47 bytes, PML4:0-255] * Guest-defined use (see below for compatibility mode guests). * 0x0000800000000000 - 0xffff7fffffffffff [16EB] * Inaccessible: current arch only supports 48-bit sign-extended VAs. [SNIPPED] * 0xffff830000000000 - 0xffff87ffffffffff [5TB, 5*2^40 bytes, PML4:262-271] * 1:1 direct mapping of all physical memory. [SNIPPED]
So we see in the last entry listed that this direct mapping exists between addresses 0xffff830000000000 - 0xffff87ffffffffff. The start address is assigned to DIRECTMAP_VIRT_START by the following code:
/* Slot 262-271/510: A direct 1:1 mapping of all of physical memory. */ #define DIRECTMAP_VIRT_START (PML4_ADDR(262)) #define DIRECTMAP_SIZE (PML4_ENTRY_BYTES * (511 - 262)) #define DIRECTMAP_VIRT_END (DIRECTMAP_VIRT_START + DIRECTMAP_SIZE)
The DIRECTMAP_VIRT_START constant is a good start for working backwards to see how the Xen hypervisor leverages the direct map for its page translations, such as by the __virt_to_maddr() function.
By doing some testing in the hypervisor debugger, or with a custom built hypervisor containing a hypercall backdoor, it’s easy enough to confirm that all memory in this range is also mapped as RWX. So this means that as long as a Xen guest operating system can query the actual underlying physical address for any memory it is using, then it can leverage this direct mapping for an attack.
Xen physical-to-machine table
If you’re familiar with Xen you’ll know that there is an additional abstraction layer related to physical addresses. A guest domain (domU) operating system, when translating a virtual address to a physical address, doesn’t actually translate to the real physical address, but to what Xen refers to as pseudo-physical addresses or guest pseudo-physical frame numbers (GPFN), though they are sometimes still just called physical addresses as a catch all. Xen instead refers to the real physical addresses on the system as machine addresses or machine frames numbers (MFN). A special table, called the physical-to-machine (p2m) mapping, is used to maintain a mapping between the pseudo –physical addresses and the MFNs. This table can be indexed by the physical frame number (PFN), and the value at that index will point to the machine frame number (MFN), which is the actual backing physical page. More information about the p2m is available in .
I tried to illustrate the basic idea in the following diagram:
This design allows for things like guest suspension and machine migration without adversely affecting the OS, as real underlying machine frames can change, without the physical addresses used by the virtualised OS being affected.
The majority of the guest OS will be unaware that what it believes are real physical addresses are being translated on the fly by an underlying Xen driver layer talking to the hypervisor via hypercalls.
The important takeaway here is that the p2m mapping is exposed to the paravirtualised guest kernel, as the Xen drivers within the guest will need to lookup and provide actual machine addresses for various hypercalls. Because this table is exposed, it means that we can read the MFN out of the p2m table using a kernel module (which we’re already required to use for exploiting our bug).
Let’s put everything together now. Revisiting our Xen exploit adventure from the previous blog , right before we execute our payload, we were at a point where we could trick the hypervisor into issuing an iret instruction with a user controlled pointer, meaning we could jump anywhere we wanted. So now, we want to point this at code we control, but without violating SMEP in the process.
We know that Xen maintains a set of virtual addresses in the hypervisor address space such that DIRECT_VIRT_START + MFN points to a physical page. A guest operating system, having access to the p2m table, can determine what MFN physically backs a given virtual allocation within the guest. This is done by translating from vaddr -> PFN -> MFN. We know Xen uses a static address for DIRECT_VIRT_START, meaning that this is also known to the guest. Therefore, by computing the MFN, the guest can work out a hypervisor-only address (that won't violate SMEP) which points to data controlled by the guest, which can of course be arbitrary.
This is illustrated in the following image:
As previously stated, the direct mapped memory is RWX. This means that Xen maintains a direct mapping of all physical pages used by every guest, and all of these pages have RWX permissions. This means that we can place shellcode into memory within the guest, work out a direct mapped virtual addressthat references it, and then redirect the hypervisor to this address instead of into our guest’s virtual address space. SMEP bypassed.
The following code demonstrates how you would do this from an lkm:
p = vmalloc(4096); <copy shellcode to p> pfn = vmalloc_to_pfn(p); mfn = get_phys_to_machine(pfn); smep_bypass_addr = (char *)(DIRECTMAP_VIRT_START + (mfn << PAGE_SHIFT));
This direct mapping is not only useful for bypassing SMEP, but also for avoiding the need to do things such as allocate memory dynamically within the hypervisor or find memory cavities to place shellcode we need to run in different interrupt contexts. I leveraged this memory for all hypervisor stages of my payload.
As noted in , there were three stages to the payload. I could redirect code execution to stage one by providing a direct map adjusted address. When hooking the various MSRs to allow migration into dom0, I could point the hooks into another portion of the direct mapped memory.
I also used it to simplify the data tunneling between dom0 (once I had migrated in) and domU. My domU module can simply map read and write buffers and inform the hypervisor payload what direct mapped virtual addresses they live at.
When the dom0 payload transmits or receives data using the syscall backdoor implemented by my payload, the data is simply copied to or from the direct mapped buffers, and the domU payload can be notified using effectively a mailbox flag in the same memory.
Bypassing SMEP and SMAP is relatively easy when exploiting Xen. The described technique would be useful for almost any hypervisor bug that could be triggered by a guest. Xen could mitigate this in a number of ways. T
hey could modify the page table entries such that the direct mapped memory range is no longer marked executable. This would be a good first step, making a SMEP bypass more difficult, though SMAP could still be abused.
Some sort of hypervisor memory layout randomisation would make things significantly more difficult, as an attacker would be forced to try to spray memory or leak the randomisation base. Ideally this randomisation would be a slide applied to all of hypervisor virtual memory and not just the direct mapped range.
I appreciate any feedback, corrections, etc. You can contact me on twitter @fidgetingbits or aaron <.> adams <@> nccgroup <.> trust.
Published date:  09 April 2015
Written by:  Aaron Adams