Meditation, The Art of Exploitation

Thinking? At last I have discovered it--thought; this alone is inseparable from me. I am, I exist--that is certain. But for how long? For as long as I am thinking. For it could be, that were I totally to cease from thinking, I should totally cease to exist....I am, then, in the strict sense only a thing that thinks.

Wednesday, May 14, 2008

C++: static initialization order fiasco and mixed language programming

As c++ faq-lite puts it, 'static initialization order fiasco' is "a subtle way to crash your program" (1). The C++ run time system exhibits a phenomenon, static objects in the global scope can initialize in arbitrary order between different builds. In other words, if there are two static objects in the global scope called A and B in a program, in one build with a compiler on a platform, A could initialize before B but on another platform with another compiler in a different build, B could initialize before A. This could happen even on a single platform with a single compiler but different build options (e.g. optimization levels)

The similarity of compiler dependency is striking between this problem and the RVO problem in the previous entry. And *similarly* any failure is a indication of coding defect by the programmer. A hypothetical scenario with the 'static initialization order fiasco' is such:




A.hpp
struct A{
static bool initialized;
A() : initialized(true) {}
};
A.cpp
bool A::initialized;

B.hpp
struct B{
B(){
if(A::initialized) x = 1;
else x = 0;
}
static int x;
};
int B:x;

C.cpp
void foo(){
int fiasco = B:x;
}




What should be the value of fiasco inside foo when the program runs? It could be either 0 or 1. Thus the program exhibits unreliable behavior.

Ok, now we know what's 'static initialization order fiasco', what does it have to do with mixed language programming? In mixed language programming, specifically between C++ and something else, let's say G, if G's run time system is dominate and starting user code is written in G (compiled and linked with G compiler/linker), G should refrain from calling any C++ code that may reference static objects (e.g. void foo()). Take a look at this bug.

Process received signal 11 (SIGSEGV)
__CPR123____ls__tm__30_Q2_3std20char_traits__tm__2_c__3stdFRQ2_3std25basic_ostre/opt/ctl/CC/5.5.0.9/include/ostream+??? (???) at ostream
c_strings_+0x00F0 (0x1004AF0) at A.C:48
stringtest_+0x0874 (0x1022374) at B.F90:77

Here is A.C:48
cout << "\n\n-- entering c_strings" << endl;

http://www.parashift.com/c++-faq-lite/ctors.html#faq-10.12

Tuesday, May 13, 2008

C++: return value optimization (RVO) and reference class member

Recently I was experimenting with expression templates (1,2), I was making good progress with my toy code until suddenly I started getting correct answer from optimized build (-O2) and segmentation faults from debug build (-O0). With judicious object creation/destruction tracer code, it's clear to me return value optimization (3,4) or RVO is playing a trick here. This is troublesome as the program behavior depends on unreliable optimization from a compiler. When RVO optimization is not available, the program produces wrong result. This syndrome typically means there is a problem in the source code that binds variables (reference or pointer type) to temporary stack object and later references it while the temporary stack object has gone out of scope and become invalid.

The source code will make the problem clearer and easier to explain:


#include < iostream>
using namespace std;

template < typename E>
struct expr {
double eval() const {
return e.eval();
};
double eval() {
return e.eval();
};

expr(const E & e) : e(e) {}
E e;
};

class Literal {
public:
Literal(double v) : val_(v) {}
double eval() const { return val_; }
private:
const double val_;
};

class Variable {
public:
Variable(double& v) : val_(v) { cout << &val_ << " default initialized " << id++ << endl;}
Variable(const Variable & v) : val_(*&(v.val_)) { cout << &val_ << " copy initialized " << this << ' ' << *(double *)this << ' ' << id++ << endl; }
~Variable() { cout << &val_ << " destroyed" << endl; }
double eval() const { cout << &val_ << " eval " << this << ' ' << &(this->val_) << endl; return val_; }
private:
double& val_;
static int id;
};

int Variable::id = 0;

// Abstraction of a binary expression
template < typename expr1, typename expr2, typename binop>
struct BinaryExpr {
double eval() const {
return binop::eval(_expr1.eval(), _expr2.eval());
}
BinaryExpr(const expr1 & e1, const expr2 & e2)
: _expr1(e1),_expr2(e2) {}
BinaryExpr(const BinaryExpr & be) : _expr1(be._expr1), _expr2(be._expr2) { cout << "BE copy initiailized" << endl; }
private:
const expr1 & _expr1;
// Cannot use reference here (const expr1 &) because _expr1 may refer to a temporary stack object and becomes invalid when the bound object went ou
t of scope later
const expr2 & _expr2;
};

// Abstraction of semantic plus operation '+'
struct plusOp {
static double eval (const double & d1, const double & d2) {
return d1 + d2;
}
};
// Expression Traits class, convert primitive type to Literal type
template < typename T>
struct ExprTraits
{
typedef T type;
};

template <>
struct ExprTraits< int>
{
typedef Literal type;
};

template <>
struct ExprTraits< double>
{
typedef Literal type;
};

// This is the critical piece in building expression template
// An overloaded operator builds an expression that can be evaluated later.
template < typename expr1, typename expr2>
expr< BinaryExpr< typename ExprTraits< expr1>::type, typename ExprTraits< expr2>::type, plusOp> >
operator + (const expr1 & e1, const expr2 & e2){
typedef BinaryExpr< typename ExprTraits< expr1>::type, typename ExprTraits< expr2>::type, plusOp> ExprT;
return expr< ExprT>(ExprT(typename ExprTraits< expr1>::type(e1), typename ExprTraits< expr2>::type(e2)));
}

template < typename E>
double eval(expr< E> e){
return e.eval();
}

int main(){

double x = 10;
Variable v(x);
Literal l(3);
cout << &x << ' ' << x << endl;

// evaluate expressions
cout << sizeof(v) << ' ' << eval(v + 10.0) << endl;
}



The problem as indicated in the source code comment, is with the member variables _expr1, _expr2 of BinaryExpr. BinaryExpr can be constructed from 2 participating expressions (e1, e2). When _expr1 is declared as const expr1 &, it binds to the first argument e1. This is the problem, when RVO is not available, it's conceivable (as shown in this example) that BinaryExpr's constructor can be invoked with temporary stack objects as arguments. When it happens, _expr1 is bound to a stack local object and later on causes erratic program behavior when referenced as the stack local object went out of scope. To fix this, a deep copy (more precisely until nothing is bound to temporary objects that can go out of scope independently) is required.

Careful readers may raise the question why in Variable, there is a 'double & val_'. Will this be a problem? Yes and No. Yes, in general, it holds true that a reference or a pointer type variable (regardless if it's a class member variable or not) when bound to a temporary stack local variable should never *dereference* that object later when the object goes out of scope. No, in this toy program, all 'double & val_' in all Variable instances are carefully referenced to '10.0' in the expression in main function and '10.0' only goes out of scope when eval() finishes and therefore the code is safe from the dreaded 'dereferencing dead (out of scope) object' problem.

References:
1. http://www.ddj.com/cpp/184401627
2. http://ubiety.uwaterloo.ca/~tveldhui/papers/Expression-Templates/exprtmpl.html
3. http://www.cs.cmu.edu/~gilpin/c++/performance.html
4. http://msdn.microsoft.com/en-us/library/ms364057(VS.80).aspx#nrvo_cpp05_topic4