Meditation, The Art of Exploitation: C++ how is STL set/map iterator implemented?

Users of STL containers enjoy the immense power of iterators provided
by different types of containers. The sequence container iterators
are easy to understand and implement conceptually. What about the
associate containers, such as set and map?

STL set and map are normally implemented through red black tree, a type
of self balancing binary search tree. Thus iterators of a set or a map
is internally represented by a pointer to a red black tree node in practice.

Let's look at some code involving the related entities:

template<_Tp>
set<_Tp> s;
node<_Tp> n;
rb_tree > rbt;

typedef node<_Tp> * set<_Tp>::iterator;
typedef node<_Tp> * rb_tree >::iterator;

Let's leave a few template parameters out of the discussion for now to illustrate
the concept. For curious readers, those parameters defines behavior of node allocation
and comparison...

Let's recall the different ways a tree can be iterated or traversed: depth first
traversal (DFS) and breath first traversal (BFS). In DFS, there are 3 orders of traversal:
pre-order (S L R), in-order (L S R), and post-order (L R S). A BST in-order DFS yields
a sorted list of nodes. (L: left, S: self, R: right)

We would stop here now if we are interested in read only set or map traversal.
However node erasure is an equally important operation. And node based containers (
such as list, set, and map) gurantees that erase(iterator) invalides the iterator
being erased only. This is a curious feature. It's easy to understand why this is so
for list. But for red black tree based set and map, how does STL gurantees this
behavior?

An important observation of the set and map containers is that iteration of these
containers always yields sorted result. This takes us back to the BST in-order
DFS problem. There is an invariance in set/map iteration: order of nodes is predefined.

Consider the following sample code:



#include < set>
#include < iostream>

using namespace std;

void display(const set & s){
    cout << "node value: ";
    set::const_iterator it = s.begin();
    for(;it != s.end(); it++) cout << " " << *it;
    cout << endl;
}

int main(){

    set s;
    s.insert(4);
    s.insert(1);
    s.insert(10);
    s.insert(2);
    s.insert(3);
    s.insert(5);

    display(s);

    s.erase(3);
    display(s);

    s.insert(3);
    display(s);

    cout << "node value: ";
    set::const_iterator it = s.begin();
    while(it != s.end()){
        if(*it == 3) s.erase(it++);
        else ++it;
        if(it != s.end()) cout << " " << *it;
    }
    cout << endl;
    display(s);
}

The output is:
./test_set_iter
node value: 1 2 3 4 5 10
node value: 1 2 4 5 10
node value: 1 2 3 4 5 10
node value: 2 3 4 5 10
node value: 1 2 4 5 10

As we can see, no matter how the iterations are done, there is a predefined
ascending order of nodes. This means that as long as the underlying red black
tree can do in-order DFS, iteration behavior is guranteed. And as a self-balancing
BST, in-order DFS is guranteed for red black tree.

Note that the 4th row prints 2 3 4 5 10, why is it doing so when we are erasing element 3? This confused me at the begining but the behavior is absolutely correct. Can you explain it?

Meditation, The Art of Exploitation

About Me

Thursday, June 28, 2007

C++ how is STL set/map iterator implemented?