Meditation, The Art of Exploitation

Thinking? At last I have discovered it--thought; this alone is inseparable from me. I am, I exist--that is certain. But for how long? For as long as I am thinking. For it could be, that were I totally to cease from thinking, I should totally cease to exist....I am, then, in the strict sense only a thing that thinks.

Tuesday, November 20, 2007

Bash resources

To work productively on a linux platform, it's essential to learn a few bash tricks to improve productivity and efficiency.

Rules of bash variable expansion:

• Parameter expansion means that we could use other shell variables in this
expression, as in: ${BASE:=${HOME}}.
• Tilde expansion means that we can use expressions like ~bob and it will expand
that to refer to the home directory of the username bob. Use ${BASE:=~uid17} to
set the default value to the home directory for user uid17, but don’t put quotes
around this string, as that will defeat the tilde expansion.
• Command substitution is what we used in the example; it will run the commands
and take their output as the value for the variable. Commands are
enclosed in the single parentheses syntax, $( cmds ).
• Arithmetic expansion means that we can do integer arithmetic, using the $(( ... ))
syntax in this expression. Here’s an example:
echo ${BASE:=/home/uid$((ID+1))}

References:

1. My Favorite bash Tips and Tricks
http://www.linuxjournal.com/article/7385

2. Power shell usage: bash tips and tricks
http://www.ukuug.org/events/linux2003/papers/bash_tips/index.html

3. Discover best bash and bash sites
http://www.blinklist.com/tag/bash/

4. http://geeki.wordpress.com/category/bash/

5. http://devmanual.gentoo.org/tools-reference/bash/index.html

C++: function visibility and accessibility: lookup and overloading

C++ has some esoteric feature concerning private class member names visibility and accessibility. By definition, a private class member name is only accessible by the class members and friends. However the rule of visibility can confuse many programmers. Herb Sutter has summarized these rules in a great article '(Mostly) Private' published on DDJ.

1. A private member's name is only accessible to other members and friends.
2. A private member is visible to all code that sees the class's definition. This means that its parameter types must be declared even if they can never be needed in this translation unit...
3. Overload resolution happens before accessibility checking.

Take the following example:



// http://www.ddj.com/cpp/184403867
// (Mostly) Private
//
// Example 3: A partly fixed version of Example 1
//
#include < complex>

class Calc {
public:
double Twice( double d );
private:
int Twice( int i );
std::complex Twice( std::complex c );
};

int main() {
Calc c;
return c.Twice( 21 ); // error, Twice is inaccessible
}



When the compiler has to resolve the call to a function S, it does three main things, in order:
a. Before doing anything else, the compiler searches for a scope that has at least one entity named Twice and makes a list of candidates. In this case, name lookup first looks in the scope of Calc to see if there is at least one function named Twice; if there isn't, base classes and enclosing namespaces will be considered in turn, one at a time, until a scope having at least one candidate is found. In this case, though, the very first scope the compiler looks in already has an entity named Twice — in fact, it has three of them, and so that trio becomes the set of candidates. (For more information about name lookup in C++, with discussion about how it affects the way you should package your classes and their interfaces, see also Items 31-34 in Exceptional C++. [4])
b. Next, the compiler performs overload resolution to pick the unique best match out of the list of candidates. In this case, the argument is 21, which is an int, and the available overloads take a double, an int, and a complex. Clearly the int parameter is the best match for the int argument (it's an exact match and no conversions are required), and so Twice(int) is selected.
c. Finally, the compiler performs accessibility checking to determine whether the selected function can be called. In this case... boom thud splatter.

4. A private member is visible to all code that sees the class's definition. This means that ... it participates in name lookup and overload resolution and so can make calls invalid or ambiguous even though it itself could never be called.

5. Code that has access to a member can grant that access to any other code, by leaking a (name-free) pointer to that member.

References:

1. (Mostly) Private, Herb Sutter, C/C++ Users Journal, Jul 01 2003

Monday, November 19, 2007

C++: rules of destructor declaration

A C++ class destructor is responsible to release the resource this class allocates during its lifetime. For a class that does not allocate any resource explicitly, it's often neglected by the class implementor and compilers merrily generate an implicit public destructor that is often suffice.

Things become a little more complicated when a class does allocate resource and needs to release it during destruction. The first rule that concerns C++ destructor is called 'rule of three', which states if one must define a destructor, one must also define copy constructor and copy assignment operator. Vice versa, if one of those three functions needs to be defined, then all three need to be defined. This rule ensures proper resource management to avoid resource leak or dangling resource.

The 'rule of three' is best applied to a concrete class that sports no inheritance or polymorphic behavior. Otherwise the rule becomes more complicated. One popular rule concerning base class destructor is that 'base class destructor must be virtual'. In fact compilers such as GNU C++ compiler issue diagnostics when it detects a base class destructor is not defined virtual.

However, as all rules, it's important to understand the intention and applicability of a rule and to act best according to its spirit. The intention of the fore mentioned base destructor rule is to ensure proper resource management when client code deletes a derived class object through a base class pointer. Given this guideline, one could make such an observation that one can simply disallow such erroneous behavior when appropriate. To do this, the base class destructor can be declared protected and non-virtual. Once this is done, such a destructor is no longer accessible to client code through delete. The mechanism of delete is explained in a previous entry on this blog. Basically, a delete call performs the following step, 1) call class destructor, polymorphically when its virtual; 2) call class operator delete if it's defined; else call global operator delete if it's defined; else call default std::delete to release the memory this class object occupies.

When a base class destructored is declared protected, client code can no longer delete a base pointer because the base destructor is only accessible to its derived class but not to the client code. Any client code insists doing so will be diagnosed with a compiler error.

Why would we want to this instead of simply following the 'base destructor must be public virtual'?

1) Performance, without virtual function, virtual function dispatching overhead is no longer incurred. This can provide significant performance boost on architectures that supports pipelining, speculative execution, TLBs, paging and caching. Because virtual function dispatching involves pointer indirection, it's likely that page faults will be generated and TLBs flushed and repopulated. Due to the extreme dynamic nature of virtual function dispatching, destination code will only be known at the last point of runtime execution, pipelining and speculative execution will be stalled and results flushed to accommodate immediate code branching.

2) Design, it simply does not make sense to have virtual public destructor when it's known a priori that such a class hierarchy does not manage any kind of resource.

Given these arguments, the second rule of destructor declaration is that 'base class destructor should be either protected non-virtual or public virtual, with protected non-virtual preferred'.

References:

1. Virtuality, Herb Sutter, C/C++ Users Journal, 19(9), September 2001
2. Generic: Change the way you write exeption-safe code---forever, Andrei Alexandrescu and Peru Marginean, C/C++ Users Journal, Dec 01, 2000

Thursday, November 15, 2007

C++: Understand rebind

You may have seen some esoteric C++ line such as this one:

typedef typename _Alloc::template rebind<_Tp>::other alloc_type;

In fact if you have read my previous entry on std::vector internal, you have read this line. Naturally the first question is what this line means?

It defines another _Alloc type (alloc_type) bound by a template parameter _Tp. It behaves just like _Alloc except that it is templatized by _Tp. It's not known at this point exactly what type _Alloc or alloc_type is. _Alloc could be a non-template type; it could be a template class and have 1,2,3...arbitrary number of template parameters. But we know that _Alloc has a inner template class named "rebind" that takes a single template parameter (apart from default template parameters rebind might define).



_Alloc_type{
public:
template < typename U>
struct rebind{
typedef some_type < U> other; // (a)
//typedef _Alloc_type < U> other; // (b)
//typedef _Alloc_type < U, T1, T2, T3, T4> other; // (c)
};
};


Apart from the theoretical possibilities, the common use of rebind is in conjuncture with a templatized _Alloc_type. Let's rule out case (a) where rebind really has nothing to do with _Alloc_type and becomes a rhetoric exercise. (b) and (c) are equally valid, and currently in STL, we see a pervasive use of case (b).

Let's expand case (b)


template < typename T>
struct _Alloc_type{
template < typename U>
struct rebind{
typedef some_type< U> other; // (a)
//typedef _Alloc_type< U> other; // (b)
//typedef _Alloc_type< U, T1, T2, T3, T4> other; // (c)
};
};



Let's also define a client class that will use _Alloc_type


template < typename T, typename U, class _Alloc = _Alloc_type < T> >
struct client{
typedef typename _Alloc::template rebind< U>::other alloc_type;
};

client< int, int *>::alloc_type allocator;



this client structure converts a int allocator to a int * allocator. Such an exercise may seem in vain given that one can simply write something simpler and equivalent:

_Alloc_type< int *> allocator;

So here is the 2nd question, why would anyone sane want to do something so strange? Let's see the power of such a construct by redo the code structure, such as the way it's used in std::vector:


template < typename T, typename _Alloc>
struct client_base {
typedef typename _Alloc::template rebind< T>::other alloc_type;
};

template < typename T, typename _Alloc = std::allocator < T> >
struct client : client_base< T, _Alloc> {
};


Here client_base takes a template parameter _Alloc from its derived client class. It is obvious that client_base has no knowedge of the exact type of _Alloc. It does not know if _Alloc will fulfill its need to work with type T. But it knows that _Alloc abide by the contract that _Alloc provides a rebind template to create another _Alloc that will work with any type T. The rebind idiom provides an excellent detour/indirection to create another type that behaves like _Alloc and works on any type T.

To better understand the idea of rebind, let's also expand case (c):



template < typename T, typename T1, typename T2, typename T3, typename T4>
struct custom_allocator {
template < typename U>
struct rebind{
typedef _Alloc_type< U, T1, T2, T3, T4> other; // (c)
};
};

template < typename T, typename _Alloc>
struct client_base {
typedef typename _Alloc::template rebind< T>::other alloc_type;
};

template < typename T, typename T1, typename T2, typename T3, typename T4, typename _Alloc = custom_allocator< T, T1, T2, T3, T4> >
struct client : client_base< T3, _Alloc> {
};


Here client::_Alloc is custom_allocator< T, T1, T2, T3, T4> but client_base::alloc_type is custom_allocator< T3, T1, T2, T3, T4>.

rebind (policy clone) is a powerful idiom to create an indirection for template types to bind together and construct new types on the fly without prior knowledge of a class template, e.g. how many template parameters it requires to instantiate and it puts no limit on the number of template parameters an abiding class template has.

References:

1. Policy Clone
2. Template Typedef, Herb Sutter, GotW #79

Trier Tree distilled

If you walk into any interview that deals with text processing. You will inevitably be asked a question that involves a Trier Tree (TT). A TT is a special tree with its edge marked by identification symbol and its vertex (node) marked by success/failure of predetermined condition.

Take the typical definition of a TT whose edge is a alphabet letter and vertex indicating if a word can be found in a dictionary connecting all edge letters from root to here. This is the most common use of a TT. Another scenario involves having edges marked by numeric letter and node indicating if a telephone number can be found in a phone book connecting all edge numbers from root to here.

It can be summarized that a TT is a tree to describe a structure of composition from elemental building blocks to some known super structures. Symbolically, let E{x | x belongs to E} denote elemental building blocks and S{y | y belongs to S} denote super structures. All edge symbols of TT belongs to E and connecting edge symbols along a path results in c that either belongs to S {y} or not.

A easy way to understand TT is to think of it as if it's composed of many paths, some path terminates with a success symbol and some path terminates with a failure symbol.

Take this example, banana, we know that ban and banana are both valid words by looking up a dictionary. R is root, F is failure, S is success, ---(alpha)--- is a edge. According to the dictionary we construct a TT such as following:

R---b--->F---a--->F---n--->S---a--->F---n--->F---a--->S

Now given the challedge 'is ba a word?', by following such a TT, we know that it ends up with a vertex symbol F indicating 'ba' is not a word.

More complicated TT can be built by following such a principle, first there is a dictionary of all super structures S that can be broken down into elemental building blocks E. Start from root R, create a path for each y in S using its building blocks x in E as edge symbols, mark each vertex on its path F if it's a new vertex and mark its ending vertex S. After creating such a TT, then it can be used to verify any y in S in O(length(y)) time.

Thursday, November 08, 2007

Using Explicit in C++ (Repost)

Using Explicit in C++
http://www.glenmccl.com/tip_023.htm

In C++ it is possible to declare constructors for a class, taking a single parameter, and use those constructors for doing type conversion. For example:


class A {
public:
A(int);
};
void f(A) {}
void g()
{
A a1 = 37;
A a2 = A(47);
A a3(57);
a1 = 67;
f(77);
}


A declaration like:

A a1 = 37;

says to call the A(int) constructor to create an A object from the integer value. Such a constructor is called a "converting constructor".

However, this type of implicit conversion can be confusing, and there is a way of disabling it, using a new keyword "explicit" in the constructor declaration:


class A {
public:
explicit A(int);
};
void f(A) {}
void g()
{
A a1 = 37; // illegal
A a2 = A(47); // OK
A a3(57); // OK
a1 = 67; // illegal
f(77); // illegal
}


Using the explicit keyword, a constructor is declared to be
"nonconverting", and explicit constructor syntax is required:


class A {
public:
explicit A(int);
};
void f(A) {}
void g()
{
A a1 = A(37);
A a2 = A(47);
A a3(57);
A a4 = 10; // error
a1 = A(67);
f(A(77));
}


Note that an expression such as:

A(47)

is closely related to function-style casts supported by C++. For example:


double d = 12.34;
int i = int(d);



Declaring a single argument copy constructor explicit also means it cannot be used in standard containers because the standard requires support of implicit copy constructor in this form:
T dest = src;

Although this requirement is not widely implemented in most vendor STL implementations. Another factor is that STL usually works with 0 argument constructors where explicit does not come into play.