Meditation, The Art of Exploitation

Thinking? At last I have discovered it--thought; this alone is inseparable from me. I am, I exist--that is certain. But for how long? For as long as I am thinking. For it could be, that were I totally to cease from thinking, I should totally cease to exist....I am, then, in the strict sense only a thing that thinks.

Tuesday, May 16, 2006

WIN32 SEH and Memory Management considered harmful (Part 1)

Originally composed on May 27, 2004, editted formatting.

It so happens that I need to write a small utility program to do some win32 EXE file patching business, actually in this case 2 files--one to generate a patch between two slightly different binary files and one to aplly the patch to a fresh binary file. The first program, MakePatch is relatively easy to cook up. However, the 2nd program Patcher requires some special coding technique to be perfect (as we'll see very soon, the win32 system has made it rather impossible).


The Patcher program first open the patch code text file, which looks like this:

offset origin_byte new_byte
...

each line of the patch code consists of an offset in the binary file to patch, the original byte code to be patched with the new byte code. Since we have no fore-knowledge how many patch lines we have in the patch code file, the program should dynamically adjust the size of the memory that hold these patch codes as they are read into the memory.


At a first glance, what could better suit this task with win32's shiny virtual memory management and structured exception handling (SEH) code. The design is simple,


except_handler(...){

new_mem=VirtualAlloc(2*cur_size);
copy_mem(cur_mem, new_mem, cur_size);

VirtualFree(cur_mem);
cur_mem = new_mem;

return EXECUTION_CONTINUE;

if anything is wrong, return EXECUTION_SEARCH;
}

patcher(...){

cur_mem = VirtualAlloc(initial_size);

__try{
read_data(FILE, cur_mem[counter]->data);
counter++;
}
__except(except_handler(...)){
do_something;
}
}


Naturally this code doesn't work. What a surprise. Well then, let's try to debug the code and see what's wrong. Again, you are hitting your head against the wall, the MS studio 6.0 just hangs after the memory violation and wouldn't jump to the exception handler subroutine. Now you have two choice if you cannot debug in assembler code, 1) try to figure out what's wrong by playing with the code; 2) give up! Because if your debuger of choice cannot correctly follow the logic of execution (in the settings of exception handling where things only trully reveal at assembler level), there is not much chance you could make it work.


So now off we go to debug this code in SIce, VirtualAlloc seems to be a natural choice to set a breakpoint on. After much tracing, it's observed that the code failed because of this: a single line of c code is often compiled into 4-5 lines of machine code and the exception can only happen and resume at a single machine instruct line, NOT the c code line! To make it easier to debug this code, I rewrote the patcher subroutine to this


patcher(...){

cur_mem = VirtualAlloc(initial_size);

__try{
read_data(FILE, tmp_data);
cur_mem[counter]->data = tmp_data;
counter++;
}
__except(except_handler(...)){
do_something;
}
}

since tmp_data is an automatic (on stack) variable and is guranteed to be accessible, the exception now occurs not in the convoluted read_data subroutine be it win32 or libc. The exception now occurs at line:


cur_mem->data = tmp_data;

which is translated to machine code like this:

mov eax, [ebp-20] ; $tmp_data
mov ecx, [0040xxxx] ; $cur_mem
mov edx, [ebp-24] ; counter
mov [ecx+edx], eax ; $cur_mem[counter]->data = $tmp_data

So in this hyperthetical case, the single c line code is translated into 4 machine instructions. And really the exception (memory access violation) occurs at the last line of the instruction,


mov [ecx+ebx], eax

Now imagine for a second, why is this a problem?

The problem is that when the exception handler allocates the new space, new_mem is a pointer pointing to a different memory location, not what cur_mem is refering too. So in other words, even though we allocated new memory space and instructed the processor to try the last instruction that generated the exception, we are still doomed to fail because the instruction is hardcoded with the values it contains. The registers are not updaing their contents to reflect the fact that we are now moving to a new memory location completely. To be more specific, consider the following scenario,


mov eax, [ebp-20] ; $tmp_data =0x00 00 00 05
mov ecx, [0040xxxx] ; $cur_mem =0x03 00 00 00
mov edx, [ebp-24] ; counter =0x00 00 10 00 4k page boundary
mov [ecx+edx], eax ; $cur_mem[counter]->data = $tmp_data

ecx+edx = 0300 1000, since these general purpose registers will have their values restored when the interrupt handler returns from the fault handler (trap gate in LDT), mov [ecx+edx], eax will be trying to do the same memory access and generates a double fault. What we really wanted is in the c code cur_mem->data = tmp_data, the cur_mem is a new value now and we are accessing our shining new memory space. Now because of the fact that a single c code line is not an atomic execution (compiled into multiple machine instructions), our program cannot run properly and we have no way to control the register values in ecx and edx upon returning from the exception handler.


How do we remedy this problem? We must gurantee that upon returning from the exception handler, ecx+edx is pointing to a valid read/write memory space. It'd be handy if there was VirtualReAlloc, but there is no such a function documented in MSDN. Another approach would be to allocate some memory somewhere else to save the current memory, then free cur_mem and VirtualAlloc with fixed base memory address and double the cur_mem_size.


|-----------| <- cur_mem | | | | <- cur_mem->size
|-----------|
| |
| |
|-----------| <- tmp_mem | | | | |-----------|

In order to be able to allocate double*$cur_mem->size, you have to make sure the temporary memory space start from at least $cur_mem+double*$cur_mem->size as shown in the graphy. A box indicates cur_mem->size amount of memory. If you fail to satisfy the above condition, you will not be able to VirtualAlloc at fixed $cur_mem address with double its current size. This approach is definitely doble but the master Jedi programmer will no doubt frown on its design and efficiency. The exception handler is simply too expensive and is against its design principle to be efficient and decisive.


What about this? We simply VirtualFree(cur_mem) and then VirtualAlloc($cur_mem, 2*cur_size)? This reduces the overhead of allocating new memory space, transfering data and yet gurantees the fixed memory address allocation. This solution may sound on paper, however it's almost guranteed to fail because of win32 memory management mechanism. First of all, VirtualFree and VirtualAlloc will fill the new allocated memory with 0; even if we disallow them to zero the pages, we cannot garantee that we will be dealing with the same physical page again upon task switch/kernel call gate control transfers. We may not look at the same physical page and even we are lucky to regain the same physical page, its contents may have been overwritten by another task or the kernel. So this approach will simply not work.


The next idea we could come up with is to allocate a new memory block that concatenates directly after the current memory block that is shown if the following graph. The idea is to allocate a new block of memory at the fixed address where the last fault occured. This idea seems rather plausible, the only drawback would be the hassle to clean this up. We have to VirtualFree every memory block that we allocate inside the exception handler. But still it sounds like a reasonble solution until we put this into implementation, VirtualAlloc would not return a useful memory address with the fixed bad address where the exception occured.



|-----------| <- cur_mem | | | | <- cur_mem->size
|-----------| <- concat_mem | | | | |-----------|

Ok, finally something that really works. The idea is to initially reserve a large chunk of memory and commit the memory upon demand. But this idea is really a static memory approach and would not really meet our requirement. Now imagine for a second how would you solve this problem?