Cisco ASA series part six: Cisco ASA mempools

This article is part of a series of blog posts. We recommend that you start at the beginning. Alternatively, scroll to the bottom of this article to navigate through the whole series.

In part six, we document some of the details around Cisco ASA mempools and how the mempool-related functions wrap more traditional heap functions in order to inject their own book-keeping structures. We will introduce a gdb plugin called libmempool [1] that aids with analysing these structures.

A mempool can be thought of simply as a region of mapped memory dedicated for allocations by Cisco ASA components. The two main mempools on an ASA device are for general purpose allocations (the global shared mempool) and DMA-related allocations (the DMA mempool).

The goal of this article is to familiarise readers with mempools and the related structures in general, so that you can recognise what you are seeing when playing with heaps on a Cisco ASA device. As such, it doesn't focus specifically on mempool exploitation techniques but rather general knowledge that should help while debugging various exploitation scenarios.

This article deals with 32-bit in the examples, however the same principles generally apply to 64-bit and we discuss any differences. The tool has been tested (albeit to a lesser extent) on 64-bit too. 

Consequently, we talk about both dlmalloc-2.8.x and ptmalloc2 in this article and we recommend reading our previous blog posts describing those. Similarly to our other heap analysis tools, libmempool is completely integrated with asadbg.

Cisco mempools

Historical mempool references

There aren't many references to Cisco mempools in general, at least by that name. When Felix Lindner (FX) and others were documenting exploiting heap overflows on Cisco devices, they were most definitely overwriting memory contained in a mempool. However, they didn't refer to it by the “mempool” term but instead used the more generic “heap” term. It seems that the allocator functions in traditional Cisco IOS systems operated on one of a number of different mempools that were tracked globally. Various higher level wrappers would wrap application-specific mempools.

Most of the information about mempools you will find online are the MEMPOOL_DMA and MEMPOOL_GLOBAL_SHARED values, which are often shown as part of bug reports related to Cisco ASA devices.

In addition, we run across a CISCO-ENHANCED-MEMPOOL-MIB reference which documents an SNMP MIB about all of the different mempools [2] that used to exist on Cisco IOS. This appears to be the most descriptive documentation about the mempools that we could find.

The summary of some different mempools from IOS, taken from the link above, are as follows:

"Represents the different types of memory pools that may be present in a
managed device. Note that only the processor pool is required to be supported
by all devices. Support for other pool types is dependent on the device being
managed.

processorMemory -
processor associated heap memory.
ioMemory -
shared memory for buffer data and controller descriptor blocks.
pciMemory -
Peripheral Component Interconnect bus memory which is visible to all devices on the PCI buses in a platform.
fastMemory -
memory defined by the particular platform for speed critical applications.
multibusMemory -
memory present on some platforms that is used as a fallback pool.
interruptStackMemory -
memory for allocating interrupt stacks. It is usually allocated from heap.
processStackMemory -
memory for allocating process stacks. It is usually allocated from heap.
localExceptionMemory -
memory reserved for processing a system core dump.
virtualMemory -
memory used to increase available RAM.
reservedMemory -
memory used for packet headers, particle headers and particles.
imageMemory -
memory which corresponds to the image file system.
asicMemory -
Application Specific Integrated Circuit memory."

Cisco ASA mempools

Unlike IOS, the Cisco ASA (at least the versions we looked at) appears to only regularly use two mempools: MEMPOOL_DMA and MEMPOOL_GLOBAL_SHARED. You can use the show memory detail command to dump information about pool statistics on a device.

ciscoasa# show mem detail
Free memory: 780174808 bytes (73%)
Used memory:
Allocated memory in use: 79657512 bytes ( 7%)
Reserved memory: 213909504 bytes (20%)
----------------------------- ------------------
Total memory: 1073741824 bytes (100%)

MEMPOOL_DMA POOL STATS:

Non-mmapped bytes allocated = 41041920
Number of free chunks = 66
Number of mmapped regions = 0
Mmapped bytes allocated = 0
[...]

----- allocated memory statistics -----

fragment size count total
(bytes) (bytes)
---------------- ---------- --------------
112 1 112
232 1 232
248 1 248
256 1 256
1024 64 65536
[...]

MEMPOOL_GLOBAL_SHARED POOL STATS:

Non-mmapped bytes allocated = 859832320
Number of free chunks = 251
Number of mmapped regions = 0
Mmapped bytes allocated = 0
[...]

----- fragmented memory statistics -----

fragment size count total
(bytes) (bytes)
---------------- ---------- --------------
16 70 1120
24 64 1536
32 36 1152
40 35 1400
[...]

----- allocated memory statistics -----

fragment size count total
(bytes) (bytes)
---------------- ---------- --------------
48 529 25392
56 3250 182000
64 7307 467648
[...]
6291456 1 6291456
8388608 1 8388608
12582912 1 12582912

Summary for all pools:

Non-mmapped bytes allocated = 900874240
[...]

You see that two mempools are analysed and it seems to be tracking both in-use and free information. It specifically has granular information about in-use chunks and we will understand more about why later.

In practice, the MEMPOOL_DMA mempool doesn't appear to be used too frequently in the configurations we tested. One way we confirmed this was by hooking all heap calls and analysing which mspace was being used to service for each allocation. Easily 99 per cent of calls are for the global shared mempool. As such, we will mostly focus on the MEMPOOL_GLOBAL_SHARED mempool in our discussion. This is the mempool that is used to store most general allocations and on 32-bit it is used to store allocations made by dlmalloc.

On 32-bit, the mempool is basically synonymous with dlmalloc mspace. The memory associated with the mempool will end up holding the chunks allocated via dlmalloc.

On 64-bit, things are different. If you read through the libdlmalloc blog post you will recall that there is a dlmalloc segment that is empty aside from holding the mstate structure itself. This segment is still associated with the MEMPOOL_GLOBAL_SHARED mempool but its sole purpose is for the book-keeping of in-use chunks stored on ptmalloc2 arenas. Because ptmalloc2 is called inside glibc, and is therefore mostly unmodified by Cisco, they didn't insert their own book-keeping changes directly into the ptmalloc2 structures. Instead, they use the same mem_mh_* wrappers and store the statistics into the mstate. This means that the dlmalloc mstate's free bins won't be actively tracking any chunks, nor will the malloc segment be holding any chunks. This behavior on 64-bit has some interesting implications on a security/stability feature of the Cisco ASA called Checkheaps, which we will discuss more in a separate article dedicated to the topic.

Mempool creation

For the sake of interest, we will take a look at how mempools are created at startup, where they are tracked, how you can find them and which ones exist on your device. During system initialisation there is a function called lina_main_thread() which will call mm_init() and, in turn, calls lina_mempool_init(). Alternatively, if other functions like malloc()calloc() or mmap() are called prior to mm_init(), and the mempools haven't been initialised yet, lina_mempool_init() will be called there instead. lina_mempool_init() is responsible for creating the primary mempools. It tests that some devices are accessible, which are used for mapping the backing memory for the mempools, and then calls lina_create_global_shared_pool() and lina_create_dma_mempool().

The memory used for the global shared pool is obtained from shared memory using shmget(). You can validate by looking at the /proc/<lina pid>/maps. The following shows the mappings for both the dma and global shared mempools:

a5c00000-a8324000 rwxs 00000000 00:0e 1794         /dev/udma0
a8400000-ab000000 rwxs 00000000 00:0b 0 /SYSV00000002 (deleted)

This backing memory is then used to create an mspace using the create_mspace_with_base() function. Note this is a Cisco-modified version of this function, which ensures enough space for a larger than normal mstate is available. This extra space is used for special mempool book-keeping bins and counters, which we will look at in more detail later. The create_mspace_with_base() function is used, rather than create_mspace(), because the address to be used for the heap has already been obtained using shmat().

Next, the address of the mapping is stored in a global array corresponding to an ID value associated with that particular mempool. This likely corresponds to some C enumeration declarations where MEMPOOL_GLOBAL_SHARED equals 0 and MEMPOL_DMA equals 6.

The initialisation code then calls mempool_add() which creates a mempool structure to track information about the mempool itself, rather than the mapping. Mempool tracking structures are allocated on the global shared mempool and, as such, the structure tracking the global shared mempool should almost always be the second chunk present on the heap (the first chunk is related to the initialisation of a heap wrapper). Other mempool structures are also added to this heap. After creation, this mempool tracking structure is inserted into a singly linked list of such structures.

Finding this global list is useful if you ever want to dump the mempools and find the associated mspace base addresses on the system you're analysing.
For the sake of example let's look at the linked list of mempools in our asa924-k8.bin firmware, which is at 0x0a9a50b4. This is a member of a larger structure we call mempool_list that starts at 0x0a9a50b0 and has the following structure:

struct mempool_list
{
size_t offset;
struct mempool_tracker* next; //first chunk in memory (lower address)
void* unk;
};

You find the mempool_tracker and mempool_list references by reversing mempool_add() in your firmware. On 64-bit firmware files with symbols, the mempool_list structure is found by looking at the mempool_list_ global. Note, we are looking at this list after both the shared pool and DMA pool are created:

(gdb) x/wx 0x0a9a50b4
0xa9a50b4: 0xa8400c80
(gdb) x/wx 0xa8400c80
0xa8400c80: 0xa8400be8

This corresponds to what should be near the start of an mspace/mempool and you could confirm by looking at the address passed to create_mspace_with_base(). If we analyse the second entry in the list (0xa8400be8) we see it points back to the address 0x0a9a50b4, which is the head, indicating 0xa8400be8 is the last element of the list.

(gdb) x/wx 0xa8400be8
0xa8400be8: 0x0a9a50b4

If we look a bit closer at one of the entries in the list, we see something worth noting:

(gdb) x/10x 0xa8400be8
0xa8400be8: 0x0a9a50b4 0xa11ccdef 0x8140d4d0 0x337ff3e1

The member pointed to by the list is at the end of the chunk, indicated by the 0xa11ccdef magic footer (more on the magic later). This means that the linked list portion of the mempool tracking structure is the last member of the structure and the linked list entries are populated using something along the lines of cur_mempool->next = &next_mempool->next;. This is rather than pointing to the start of the structure itself, which is not uncommon.

We can quite easily work out where the mempool structure starts by using libdlmalloc's dlchunk command. An mspace will always start with a chunk holding its own mstate structure, so we read the chunk lengths starting from what we think is the base:

(gdb) dlchunk -c 3 0xa8400000
0xa8400000 M sz:0x00ae8 fl:CP alloc_pc:0xa8400bf0,-
0xa8400ae8 M sz:0x00070 fl:CP alloc_pc:0x0916548a,-
0xa8400b58 M sz:0x00098 fl:CP alloc_pc:0x090fa11e,-

The command shows that the linked list entry seems to fall into the third chunk above, so we take a closer look:

(gdb) dlchunk -x 0xa8400b58
0xa8400b58 M sz:0x00098 fl:CP alloc_pc:0x090fa11e,-
0x90 bytes of chunk data:
0xa8400b60: 0xa11c0123 0x0000006c 0x00000000 0x00000000
0xa8400b70: 0x00000000 0xa8400444 0x090fa11e 0x00000000
0xa8400b80: 0xa8400008 0x504d454d 0x5f4c4f4f 0x424f4c47
0xa8400b90: 0x535f4c41 0x45524148 0x00000044 0x00000000
0xa8400ba0: 0x00000000 0x00000000 0x00000000 0x00000000
0xa8400bb0: 0x00000000 0x00000000 0x00000000 0x00000000
0xa8400bc0: 0x00000000 0x00000000 0x00000000 0x00000000
0xa8400bd0: 0x00000000 0x00000000 0x00000000 0x00000000
0xa8400be0: 0x00000000 0x00000000 0x0a9a50b4 0xa11ccdef

We see a few values of interest in here. The actual chunk data starts at 0xa8400b80. The first 0x20 bytes are a separate mempool structure we will describe later. With that in mind, the first field, 0xa8400008, is the actual address of the mstate structure. The string at 0xa84000b84 is the name of the mempool:

(gdb) x/s 0xa8400b84
0xa8400b84: "MEMPOOL_GLOBAL_SHARED"

The approximate structure layout for these mempools' tracker on 32-bit is as below, though we've seen the size change significantly across different 64-bit builds:

struct mempool_tracker
{
mstate * m; // pointer to dlmsate + mp_mspace
char pool_name[0x50]; // "MEMPOOL_DMA", "MEMPOOL_GLOBAL_SHARED"
int field_58;
int mempool_id;
void * next; // points to some &mempool_tracker->next
};

As it turns out, the distance from the next field to the start of the structure corresponds to the offset value shown earlier in mempool_list. By modeling these structures in Python we can dump the lists from gdb:

(gdb) python import libmempool as lmp
(gdb) python m = lmp.mempool_list(addr=0x0a9a50b0)
(gdb) python print(m)
struct mempool_list @ 0xa9a50b0 {
offset = 0x68
head = 0xa8400c80
unk = 0x0
struct mempool @ 0xa8400c18 {
dlmstate = 0xa5c00008
pool_name = MEMPOOL_DMA
field_58 = 0x0
mempool_id = 0x6
next = 0xa8400be8
struct mempool @ 0xa8400b80 {
dlmstate = 0xa8400008
pool_name = MEMPOOL_GLOBAL_SHARED
field_58 = 0x0
mempool_id = 0x0
next = 0xa9a50b4

Allocator function wrapping

Heap allocations on the Cisco ASA involve the execution of a number of wrappers which inevitably lead to the underlying core heap allocator. These wrappers facilitate the injection of mempool book-keeping structures.

The call chain of a typical allocation on a 32-bit firmware will look like something like this:

ikev2_malloc() -> resMgrCalloc() -> mem_mh_calloc() -> mspace_malloc()

The base wrapper functions will typically take a size argument, as in ikev2_malloc(int size). They will pull out the mempool ID associated with the current process and pass the associated mempool address to resMgrCalloc(mempool * mp, int size). The resMgrCalloc() function will loop over a list of installed shims trying to service the allocation request. By default, the shims installed are the set of mem_mh_* functions. The prototype for mem_mh_calloc(mempool *mp, int size) is the same as resMgrCalloc(). This is where the interesting bits happen. The mem_mh_* functions wrap the core allocator and are responsible for adjusting the requested size in order to fit book-keeping functions (which itself is problematic, as we’ll note later).

Mempool headers

The book-keeping structures are worked out fairly easily through reversing and the names are pretty much all inferred through various assert() errors. We usually refer to these book-keeping structures simply as mh structures or mempool headers as the members are prefixed with mh_. It’s likely that mh stands for mempool header but we can't say for sure.

A mempool header looks as follows:

struct mp_header {
unsigned mh_magic;
unsigned mh_len;
unsigned mh_refcount; // also used for free magic when freed
unsigned mh_unused;
struct mp_header * mh_fd_link;
struct mp_header * mh_bk_link;
void * alloc_pc;
void * free_pc;
};

This is the maximum content that such a header can hold, but as we will see, depending on the size and state of the chunk it lives inside, some of these fields may be unused. For those familiar with the old block structures and REDZONE magic, as described by FX, you can likely see some similarities.

The mh_magic value is always set to 0xa11c0123, the a11c presumably standing for allocmh_len represents the actual size of chunk data, so not including the mp_header size, the mh_footer value, or the backing allocator’s metadata. The mh_refcount is used to store the number of references to an in-use chunk; typically this is zero or one. We haven't closely investigated under which cases refcounting is actually used. When a chunk is freed, depending on the size of the metadata used by the core allocator, the mh_refcount might instead hold special free magic of 0xf3ee0123, with the f3ee portion presumably standing for free. This will also correspond to a change of mh_footer to 0xf3eecdef. The mh_unused field we haven't seen used for anything so far, aside from one specific free chunk layout explained later.

The mh_fd_link and mh_bk_link are used to reference other in-use chunks that fall within the same bin size. This is of particular interest, as corrupting these fields can be used for mirror-write primitives. This was abused by Exodus Intel [3], though they didn't describe its relationship to the larger mempool data structure.

The alloc_pc field is interesting in that it is used to track the program counter of the function that called the main allocation wrapper. In other words, it corresponds to the address of the call to ikev2_malloc() in whatever IKEv2-related function issued it would be stored here. This is extremely useful when analysing feng shui layouts, as you immediately know where all of the allocations came from and if any are from noise or code paths you didn't figure into your layouts.

Finally, the free_pc field is similar, but only populated on free. This holds a similar value to alloc_pc and shows you exactly where a free wrapper was called from. However, there seems to be numerous cases where this field isn't populated and we haven't closely investigated why that is.

As an example of the value of alloc_pc and free_pc, the following dump is taken from a simple breakpoint script that runs dlchunk on a heap address each time it is hit. The dlchunk command will issue a callback into libmempool, which in turn will look up the symbol at alloc_pc using the ret-sync plugin [9]. This is done by checking the address against our IDA database containing symbols.

ikev2_malloc -> 0xac8e51e8 M sz:0x00078 fl:CP alloc_pc:0x087b806b,ikev2_packet_enqueue+0x1b
ikev2_malloc -> 0xad424c88 M sz:0x001d0 fl:CP alloc_pc:0x087b8085,ikev2_packet_enqueue+0x35
ikev2_malloc -> 0xacb89fc8 M sz:0x00048 fl:CP alloc_pc:0x087b2c53,ikev2_enqueue_event+0x23
ikev2_free -> 0xac8e5178 M sz:0x00070 fl:C- alloc_pc:0x08688fd3,enqueue_ikev2_ext_action+0x23
ikev2_malloc -> 0xa9909b28 M sz:0x00040 fl:CP alloc_pc:0x0870a8d6,ikev2_hash+0xa3
ikev2_malloc -> 0xad424e58 M sz:0x00208 fl:CP alloc_pc:0x08782490,ikev2_create_sa+0x50
ikev2_malloc -> 0xac8e5128 M sz:0x000a0 fl:CP alloc_pc:0x08737876,ikev2_create_timer+0x87
ikev2_malloc -> 0xacb89a88 M sz:0x00050 fl:CP alloc_pc:0x087378bb,ikev2_create_timer+0xcc
ikev2_malloc -> 0xacb89ad8 M sz:0x000a0 fl:CP alloc_pc:0x08737876,ikev2_create_timer+0x87
ikev2_malloc -> 0xacb89b78 M sz:0x00050 fl:CP alloc_pc:0x087378bb,ikev2_create_timer+0xcc
ikev2_malloc -> 0xad425060 M sz:0x00210 fl:CP alloc_pc:0x0878205a,ikev2_create_neg_ctx+0x2a
ikev2_malloc -> 0xacb89bc8 M sz:0x00048 fl:CP alloc_pc:0x087820e0,ikev2_create_neg_ctx+0xb0
ikev2_malloc -> 0xacff7940 M sz:0x000a0 fl:CP alloc_pc:0x08737876,ikev2_create_timer+0x87
ikev2_malloc -> 0xacff79e0 M sz:0x00050 fl:CP alloc_pc:0x087378bb,ikev2_create_timer+0xcc
ikev2_malloc -> 0xacff7a30 M sz:0x00040 fl:CP alloc_pc:0x087abe93,ikev2_log_hdr+0x23
ikev2_malloc -> 0xacff7a70 M sz:0x00040 fl:CP alloc_pc:0x087abea5,ikev2_log_hdr+0x35
ikev2_free -> 0xacff7a30 M sz:0x00040 fl:CP alloc_pc:0x087abe93,ikev2_log_hdr+0x23
ikev2_free -> 0xacff7a70 M sz:0x00040 fl:C- alloc_pc:0x087abea5,ikev2_log_hdr+0x35
ikev2_malloc -> 0xad425270 M sz:0x001d0 fl:CP alloc_pc:0x0877de69,ikev2_parse_packet+0x179
ikev2_malloc -> 0xad425440 M sz:0x00130 fl:CP alloc_pc:0x087b2734,ikev2_data_to_packet+0xe4
ikev2_malloc -> 0xad425570 M sz:0x00330 fl:CP alloc_pc:0x087b26a8,ikev2_data_to_packet+0x58

Knowing the purpose of these values is vital: when you are corrupting memory that holds these structures it's important to know what holds what, and which of the values must be overwritten with precision.

If you read through our libdmalloc blog post, you’ll know that we added support for special callbacks when analysing chunks. The primary reasoning for this was to facilitate mp_header analysis without hardcoding it into libdlmalloc itself, since the latter would work with non-Cisco ASA devices whereas the mp_header is very specific to Cisco ASAs. We'll talk about libmempool and how the callbacks work in more detail later. For now we show that we can analyse this mp_header in detail:

(gdb) dlchunk -v 0xa8400dc0
struct malloc_chunk @ 0xa8400dc0 {
prev_foot = 0x8140d4d0
size = 0x30 (CINUSE|PINUSE)
struct mp_header @ 0xa8400dc8 {
mh_magic = 0xa11c0123
mh_len = 0x4
mh_refcount = 0x0
mh_unused = 0x0
mh_fd_link = 0xa8400d98 (OK)
mh_bk_link = 0xa8400df8 (OK)
alloc_pc = 0x9880382 (-)
free_pc = 0xdc7dc25a (-)

The mh_fd_link shown above should point to another 0x30 byte chunk, which tracks four bytes of data as indicated by mh_len. If we look at the chunk there (adjusting for the dlmalloc-2.8.x metadata for 32-bit) we see it is similar and points back to the chunk's mp_header above (0xa8400dc8):

(gdb) dlchunk -v 0xa8400d98-8
struct malloc_chunk @ 0xa8400d90 {
prev_foot = 0x8140d4d0
size = 0x30 (CINUSE|PINUSE)
struct mp_header @ 0xa8400d98 {
mh_magic = 0xa11c0123
mh_len = 0x4
mh_refcount = 0x0
mh_unused = 0x0
mh_fd_link = 0xa8400d30 (OK)
mh_bk_link = 0xa8400dc8 (OK)
alloc_pc = 0xdc7db2ca (-)
free_pc = 0x0 (-)

These lists are used to track allocated chunks in dedicated dlmalloc-2.8.x-sized smallbins and treebins, which are appended to the default mstate structure. The additions include a list of 32-bit counter values that tracks the count in each bin and then the lists pointing to each in-use chunk. When a chunk is freed, it is unsafely unlinked from this doubly linked list. This is, for instance, how Exodus Intelligence [3] was able to achieve two mirror-writes in their IKEv2 exploit.

To get a better idea of how everything fits together, the following image shows a whole mspace and how the mstate and custom modifications are laid out.



mempool overview

To further demonstrate, the below image shows how the traditional dlmalloc free bins track free chunks and how the mempool book-keeping structures track the in-use chunks.


bins

These in-use bins we've described are exactly how the earlier show mem detail command is able to track how many chunks are currently in use on a given pool.

An important note about this unsafe unlinking of mh structures is that this continues to be the case on newer 64-bit firmware files that have started using ptmalloc2, which uses safe unlinking of free chunks. This means that, although you can't abuse the traditional ptmalloc2 unlink() function by overwriting free chunks, you can instead overwrite in-use chunks and achieve mirror-writes when free() is called. This latter approach works because the mem_mh_free() wrapper will unsafely remove the allocated chunk from the in-use bin list.

In addition to mempool header, a magic value is inserted into the end of a chunk, which we will call mh_footer. This always contains 0xa11ccdef.

One nice thing about this mempool header from an exploitation perspective is that a heap-based memory revelation bug can give you a lot of really useful information. You get not only heap addresses, but a .text address that you can tie to a specific firmware version that can get you started running with Return-Oriented Programming (ROP) if necessary. FX abused a similar structure when exploiting the “UDP echo service leak” [7] [8] on IOS.

Mempool headers in free chunks

Although most of the important parts of the mempool headers have been covered, it's worth noting what this structure will look like in the event the chunk is free. Although the chunk is no longer stored on a mempool-related linked list, it is still populated with free magic that will be validated at certain times.
Recall that in dlmalloc-2.8.x there are small free chunks and tree free chunks. Each of these have different inline structures and thus impose different limits on the mempool header data stored. Furthermore, free chunks are sometimes too small to hold the mempool header, which also has an impact.

First let's look at a smallbin free chunk and how its contents are laid out:

(gdb) dlchunk -v 0xacbf7a68
struct malloc_chunk @ 0xacbf7a68 {
prev_foot = 0x8140d4d0
head = 0x58 (PINUSE)
fd = 0xa8400084
bk = 0xa8400084
struct mp_header @ 0xacbf7a78 {
mh_refcount = 0xf3ee0123
mh_unused = 0x0
mh_fd_link = 0x0 (unmapped)
mh_bk_link = 0x0 (unmapped)
alloc_pc = 0x0 (-)
free_pc = 0x0 (-)

Most importantly we see that the 0xf3ee0123 magic has been inserted over the mh_refcount field. None of the other values are set. Additionally, a new mh_footer value will be injected:

(gdb) x/x 0xacbf7a68+0x58-4
0xacbf7abc: 0xf3eecdef

Now let's look at a free tree chunk, where significantly more metadata is stored inline by dlmalloc. In this case only the last two fields of the mp_header structure are used, and now the alloc_pc field is used to store the free magic:

gdb) dlchunk -v 0xa883da10
struct malloc_tree_chunk @ 0xa883da10 {
prev_foot = 0x8140d4d0
head = 0x1f8 (PINUSE)
fd = 0xa883da10
bk = 0xa883da10
left = 0x0
right = 0x0
parent = 0xa8400138
bindex = 0x1
struct mp_header @ 0xa883da30 {
alloc_pc = 0xf3ee0123 (-)
free_pc = 0x9b44a9e (-)

Another case of an alternate layout is if the containing chunk is too small to hold more than the free magic. For instance, a 0x18 chunk will look like so on 32-bit:

(gdb) dlchunk -v 0xa878d0a8
struct malloc_chunk @ 0xa878d0a8 {
prev_foot = 0x8140d4d0
head = 0x18 (PINUSE)
fd = 0xa8785e60
bk = 0xa87883e8
struct mp_header @ 0xa878d0b8 {
mh_refcount = 0xf3ee0123
mh_unused = 0xf3eecdef

Note that the above is the only case we’ve observed the mh_unused field being used for something explicit.

If a chunk is too small to hold any mempool magic, such as a 0x10-byte chunk, it simply won't.

The main reason this is worth knowing is because there are checks done on the system that will verify that the correct free magic is set - most notably in the Checkheaps process.

Temporary free magic

In addition to all the free magic we saw earlier, there is another set of header/footer magic values used. These are placed into the chunk right before it is passed to the core allocator. The magic values are 0x5ee33210 and 0x5ee3fedc. For instance, the mh_mem_free mempool wrapper will place the 0x5ee33210/0x5ee3fedc magic values before passing the chunk to the dlmalloc mspace_free. When mspace_free returns, it will replace 0x5ee33210 with 0xf3ee0123 and 0x5ee3fedc with 0xf3eecdef. Note that the 5ee3 portion possibly stands for seen as the mempool wrappers has 5ee3 the chunks before they are f3ee.

In general the use of this magic isn’t too important in practice, but they're worth noting for one specific reason. The value is observed in situations where you're analysing a chunk that is about to be freed and then, when you analyse it post-free, you see 0x5ee33210 instead of the expected 0xf3ee0123. You will also see 0x5ee3fedc as a footer. This indicates to you that the chunk was coalesced backwards. The wrapper will be unaware of the coalescing so won't update the magic where the actual new free chunks header is. If you see this value it means you're looking inside a stale part of a now larger chunk.

mem_mh_* wrapper integer overflows

Both the mem_mh_malloc() and mem_mh_calloc() functions are prone to integer overflow. Both can overflow in the event that the requested size is large enough such that when the mh header size is added, it wraps. In scenarios where attackers influence an allocation request size, this facilitates subsequent memory corruption. The caveat to this being that it will almost definitely always result in a wild copy.

Additionally, mem_mh_calloc() never calls into mspace_calloc() but rather chooses to do the allocation itself and passes that to mspace_malloc(). This calculation is not done securely and this means that not only is it technically vulnerable to the integer overflow when adding the mp_header overhead, but it's prone to integer overflow when calculating the size to pass directly to mspace_malloc().

This is a common issue that we have observed numerous times in both heap wrappers and custom heaps on other systems, especially those that are embedded.

libmempool

As we've alluded to throughout the document, we created a gdb plugin called libmempool. It is primarily meant to augment the libdlmalloc and libptmalloc2 libraries, but also provides some stand-alone use.

There is some minor redundancy in libmempool, such as letting you analyse a chunk using only the mempool header, which is very similar to dlchunk or ptchunk. This is intentional because, for example, if in the future the heap allocator were to change, you could still use this functionality to do basic analysis of mempool headers and chunk data until a gdb plugin for the core allocator was built.

The main caveat to this tool is that many commands rely on knowing the address of the mempool portion of the mstate address to begin with. This can change across builds, so even though some commands will work without, you will get limited information. However, the tool tries to work out where the base address is if you, for instance, try to walk a list from an mp_header.

Another way to work around this is to use idahunt [4] and asadbg_rename.py [5]/asadbg_hunt.py [6] (which are part of asadbg) to automatically retrieve the mempool_array global used by libmempool to work out the mstate address and hence the mempool portion.

We now list the commands supported by libmempool.

mphelp

mphelp lists the currently supported libmempool commands. Most commands contain their own -h options if they support actual flags, which you can use to obtain extra information.

(gdb) mphelp
[libmempool] mempool commands for gdb
[libmempool] mpheader : show chunk contents (-v for verbose, -x for data dump)
[libmempool] mpbinwalk : walk an mpbin and operate on each chunk in a bin
[libmempool] mpbin : determine to which bin an mp_header is associated to
[libmempool] mpmstate : display and cache a mempool mstate address
[libmempool] mphelp : this help message

mpheader

mpheader simply displays the mp_header structure. As you can see, the name is very similar to dlchunk or ptchunk that would allow the displaying of dlmalloc or ptmalloc chunks, respectively. This eases remembering commands when working with all tools.

(gdb) mpheader 0xaa5b0f30
struct mp_header @ 0xaa5b0f30 {
mh_magic = 0xa11c0123
mh_len = 0x50027
mh_refcount = 0x0
mh_unused = 0x0
mh_fd_link = 0xaa55ee08 (OK)
mh_bk_link = 0xa8400864 (-)
alloc_pc = 0x9b91742 (-)
free_pc = 0x0 (-)

mpbin

The mpbin command allows you to find the bin associated with some in-use chunk. This can be useful for finding and caching the mp_mstate address if you don't know the mstate location yet. This will then let you populate the actual dlmstate value by masking the address to 0xNNNNNN08.

Let's imagine that we don't know where the dlmstate is yet, but we found some in-use chunk in gdb and we know the address of its mp_header is 0xacff49c0. We see it is a legit header:

(gdb) mpheader 0xacff49c0
struct mp_header @ 0xacff49c0 {
mh_magic = 0xa11c0123
mh_len = 0xe8
mh_refcount = 0x0
mh_unused = 0x0
mh_fd_link = 0xacff47c8 (OK)
mh_bk_link = 0xacb47378 (OK)
alloc_pc = 0x828fea9 (-)
free_pc = 0x12345678 (-)

Now we want to try to find the associated mstate mempool book-keeping bin:

(gdb) mpbin 0xacff49c0
[libmempool] Found bin start at 0xa84005e4
[libmempool] Cached new mp_mstate @ 0xa84001e4
[libmempool] mp_treebin[00] - sz: 0x00000180 cnt: 0x0190, mh_fd_link: 0xacb47378

This lets us know that the mempool portion of the mstate structure starts at 0xa84005e4. We see that the chunk falls within the first mempool treebin, which currently tracks a list of 0x190 chunks. We mask the mstate address to view and cache the complete mstate structure:

(gdb) dlmstate 0xa8400008
struct dl_mstate @ 0xa8400008 {
smallmap = 0b000000000010000010010111111100
treemap = 0b000000000000000000000000000110
[...]
struct mp_mstate @ 0xa84001e4 {
mp_smallbin[00] - sz: 0x00000000 cnt: 0x0000, mh_fd_link: 0x0
[...]
mp_smallbin[06] - sz: 0x00000030 cnt: 0x0213, mh_fd_link: 0xacb56250
mp_smallbin[07] - sz: 0x00000038 cnt: 0x0cb4, mh_fd_link: 0xa94b1250
mp_smallbin[08] - sz: 0x00000040 cnt: 0x1c95, mh_fd_link: 0xac4e4b50
[...]
mp_treebin[30] - sz: 0x00c00000 cnt: 0x0001, mh_fd_link: 0xaae41228
mp_treebin[31] - sz: 0xffffffff cnt: 0x0001, mh_fd_link: 0xab641758 [UNSORTED]

This lets us then see all of the mempool bins, which you could use to walk lists.

mpbinwalk

The mpbinwalk command allows you to traverse the linked list of in-use chunks either starting from a bin head or from a specific chunk in a list.

Let's look at the mempool treebin tracking:

(gdb) mpbinwalk 0x00060000
[libmempool] mp_header @ 0xa8400864 - mh_len: 0x00000000, alloc_pc: 0x00000000 [BIN HEAD]
[libmempool] mp_header @ 0xaa5b0f30 - mh_len: 0x00050027, alloc_pc: 0x09b91742
[libmempool] mp_header @ 0xaa55ee08 - mh_len: 0x00050027, alloc_pc: 0x09b91742
[libmempool] mp_header @ 0xaa09b490 - mh_len: 0x00050027, alloc_pc: 0x09b91742
[libmempool] mp_header @ 0xa890dfd8 - mh_len: 0x000480b3, alloc_pc: 0x09b91a23
[libmempool] mp_header @ 0xa87d5068 - mh_len: 0x00050027, alloc_pc: 0x09b91742
[libmempool] mp_header @ 0xa8720950 - mh_len: 0x000477a3, alloc_pc: 0x09b91a23

It contains 0x6 entries while also showing the bin head. We use the -P option to start showing the entries from a different starting location:

(gdb) mpbinwalk -P 0xaa09b490
[libmempool] mp_header @ 0xaa09b490 - mh_len: 0x00050027, alloc_pc: 0x09b91742 [BIN HEAD]
[libmempool] mp_header @ 0xa890dfd8 - mh_len: 0x000480b3, alloc_pc: 0x09b91a23
[libmempool] mp_header @ 0xa87d5068 - mh_len: 0x00050027, alloc_pc: 0x09b91742
[libmempool] mp_header @ 0xa8720950 - mh_len: 0x000477a3, alloc_pc: 0x09b91a23

We show verbose mp_header information for each chunk:

(gdb) mpbinwalk -P 0xa87d5068 -v
struct mp_header @ 0xa87d5068 {
mh_magic = 0xa11c0123
mh_len = 0x50027
mh_refcount = 0x0
mh_unused = 0x0
mh_fd_link = 0xa8720950 (OK)
mh_bk_link = 0xa890dfd8 (OK)
alloc_pc = 0x9b91742 (-)
free_pc = 0x0 (-)
struct mp_header @ 0xa8720950 {
mh_magic = 0xa11c0123
mh_len = 0x477a3
mh_refcount = 0x0
mh_unused = 0x0
mh_fd_link = 0x0 (unmapped)
mh_bk_link = 0xa87d5068 (OK)
alloc_pc = 0x9b91a23 (-)
free_pc = 0x8063047 (-)

It allows you to search chunks in this list for specific values. This is a useful way to quickly search for chunks rather than searching linearly across the whole heap. As a basic example, let's say we want to search the six chunks above to see which have the value 0x09b91742 in the first 256 bytes of the chunk. In this case it corresponds to the alloc_pc value, so you see it in the matching entries.

(gdb) mpbinwalk 0x00060000 -s 0x09b91742 --depth 256 -l
[libmempool] mp_header @ 0xa8400864 - mh_len: 0x00000000, alloc_pc: 0x00000000 [BIN HEAD]
[libmempool] mp_header @ 0xaa5b0f30 - mh_len: 0x00050027, alloc_pc: 0x09b91742 [MATCH]
[libmempool] mp_header @ 0xaa55ee08 - mh_len: 0x00050027, alloc_pc: 0x09b91742 [MATCH]
[libmempool] mp_header @ 0xaa09b490 - mh_len: 0x00050027, alloc_pc: 0x09b91742 [MATCH]
[libmempool] mp_header @ 0xa890dfd8 - mh_len: 0x000480b3, alloc_pc: 0x09b91a23 [NO MATCH]
[libmempool] mp_header @ 0xa87d5068 - mh_len: 0x00050027, alloc_pc: 0x09b91742 [MATCH]
[libmempool] mp_header @ 0xa8720950 - mh_len: 0x000477a3, alloc_pc: 0x09b91a23 [NO MATCH]

mpmstate

If you happen to know the address of the mempool portion of the mstate and want to look at it in isolation (and cache it) you can use the mpmstate command:

(gdb) mpmstate 0xa84001e4
struct mp_mstate @ 0xa84001e4 {
mp_smallbin[00] - sz: 0x00000000 cnt: 0x0000, mh_fd_link: 0x0
[...]
mp_smallbin[29] - sz: 0x000000e8 cnt: 0x0013, mh_fd_link: 0xac783e88
mp_smallbin[30] - sz: 0x000000f0 cnt: 0x0007, mh_fd_link: 0xa8a1f790
mp_smallbin[31] - sz: 0x000000f8 cnt: 0x0045, mh_fd_link: 0xacff5b00
mp_treebin[00] - sz: 0x00000180 cnt: 0x0190, mh_fd_link: 0xacb47378
mp_treebin[01] - sz: 0x00000200 cnt: 0x0134, mh_fd_link: 0xa95059d8
mp_treebin[02] - sz: 0x00000300 cnt: 0x01ac, mh_fd_link: 0xad01cb28
mp_treebin[03] - sz: 0x00000400 cnt: 0x004e, mh_fd_link: 0xacffb8b8
[...]
mp_treebin[31] - sz: 0xffffffff cnt: 0x0001, mh_fd_link: 0xab641758 [UNSORTED]

mpcallback

Those that read the libdlmalloc and libptmalloc2 posts will know that we implemented a callback mechanism for both. libmempool is currently our only implementation of this. It exposes a function that reads a bunch of information provided in the callback dictionary and annotates the mempool portion of both the Cisco ASA mstate customisations and the mp_header structure injected into chunks.

Other mempool structures

As shown earlier when describing how mempools are tracked, the libmempool library contains a few structures not directly related to commands that allow you to inspect mempool-related structures. This currently includes the mempool and mempool_list structures.

Hardening mempool structure handling

Cisco could increase the difficulty of exploiting heap-based memory corruption by ensuring that the linkage of mempool structures is validated prior to unlinking. Additionally, the magic values used for structures could be randomly generated at runtime to prevent prediction by an attacker, unless coupled with a memory revelation attack.

Conclusion

We’ve described some background of mempool structures on Cisco ASA and highlighted some weaknesses in the way the mempool wrappers and data structures are implemented that aids in exploitation. We’ve also presented a gdb plugin libmempool, which aids with the analysis of mempool structures. It can be coupled with other gdb plugins like libdlmalloc or libptmalloc2, or can be used stand-alone if necessary.


We would appreciate any feedback or corrections. You can test out the libmempool [1] code and feel free to send pull requests for any issues you have. The tool is not perfect, so don't be surprised if you run into bugs.

If you would like to contact us we can be reached by email or twitter: aaron(dot)adams(at)nccgroup(dot)trust/@fidgetingbits and cedric(dot)halbronn(at)nccgroup(dot)trust/@saidelike.


Read all posts in the Cisco ASA series

References

[1] https://github.com/nccgroup/libmempool

[2] https://www.activexperts.com/admin/mib/Cisco/CISCO-ENHANCED-MEMPOOL-MIB/

[3] https://cansecwest.com/slides/2016/CSW2016_Wheeler-Barksdale-Gruskovnjak_ExecuteMyPacket.pdf

[4] https://github.com/nccgroup/idahunt

[5] https://github.com/nccgroup/asadbg/blob/master/asadbg_rename.py

[6] https://github.com/nccgroup/asadbg/blob/master/asadbg_hunt.py

[7] https://www.exploit-db.com/exploits/77/

[8] https://www.defcon.org/images/defcon-11/dc-11-presentations/dc-11-FX/dc-11-FX.PDF

[9] https://github.com/bootleg/ret-sync

Published date:  23 October 2017

Written by:  Aaron Adams and Cedric Halbronn

comments powered by Disqus