Meditation, The Art of Exploitation: June 2007

C++ how is STL set/map iterator implemented?

Users of STL containers enjoy the immense power of iterators provided
by different types of containers. The sequence container iterators
are easy to understand and implement conceptually. What about the
associate containers, such as set and map?

STL set and map are normally implemented through red black tree, a type
of self balancing binary search tree. Thus iterators of a set or a map
is internally represented by a pointer to a red black tree node in practice.

Let's look at some code involving the related entities:

template<_Tp>
set<_Tp> s;
node<_Tp> n;
rb_tree > rbt;

typedef node<_Tp> * set<_Tp>::iterator;
typedef node<_Tp> * rb_tree >::iterator;

Let's leave a few template parameters out of the discussion for now to illustrate
the concept. For curious readers, those parameters defines behavior of node allocation
and comparison...

Let's recall the different ways a tree can be iterated or traversed: depth first
traversal (DFS) and breath first traversal (BFS). In DFS, there are 3 orders of traversal:
pre-order (S L R), in-order (L S R), and post-order (L R S). A BST in-order DFS yields
a sorted list of nodes. (L: left, S: self, R: right)

We would stop here now if we are interested in read only set or map traversal.
However node erasure is an equally important operation. And node based containers (
such as list, set, and map) gurantees that erase(iterator) invalides the iterator
being erased only. This is a curious feature. It's easy to understand why this is so
for list. But for red black tree based set and map, how does STL gurantees this
behavior?

An important observation of the set and map containers is that iteration of these
containers always yields sorted result. This takes us back to the BST in-order
DFS problem. There is an invariance in set/map iteration: order of nodes is predefined.

Consider the following sample code:



#include < set>
#include < iostream>

using namespace std;

void display(const set & s){
    cout << "node value: ";
    set::const_iterator it = s.begin();
    for(;it != s.end(); it++) cout << " " << *it;
    cout << endl;
}

int main(){

    set s;
    s.insert(4);
    s.insert(1);
    s.insert(10);
    s.insert(2);
    s.insert(3);
    s.insert(5);

    display(s);

    s.erase(3);
    display(s);

    s.insert(3);
    display(s);

    cout << "node value: ";
    set::const_iterator it = s.begin();
    while(it != s.end()){
        if(*it == 3) s.erase(it++);
        else ++it;
        if(it != s.end()) cout << " " << *it;
    }
    cout << endl;
    display(s);
}

The output is:
./test_set_iter
node value: 1 2 3 4 5 10
node value: 1 2 4 5 10
node value: 1 2 3 4 5 10
node value: 2 3 4 5 10
node value: 1 2 4 5 10

As we can see, no matter how the iterations are done, there is a predefined
ascending order of nodes. This means that as long as the underlying red black
tree can do in-order DFS, iteration behavior is guranteed. And as a self-balancing
BST, in-order DFS is guranteed for red black tree.

Note that the 4th row prints 2 3 4 5 10, why is it doing so when we are erasing element 3? This confused me at the begining but the behavior is absolutely correct. Can you explain it?

libxml high performance xml parser

I found this website extremely helpful in order to start libxml programming in C. http://xmlsoft.org/tutorial/index.html

C++: The standard template library: vector

It's advised that an aspiring programer should read and study the C++ STL source code. Although I had fair amount of experience with STL, I decided to take time to study its source code (distributed in gnu c++ compiler 4.1.1).

Let's start with the most used container std::vector. First of all, the journey starts with the top level header file found at /usr/include/c++/4.1.1/vector



#ifndef _GLIBCXX_VECTOR
#define _GLIBCXX_VECTOR 1

#pragma GCC system_header

#include            // function exception helper, -fno-exception support
#include           // internal algorithm suport, swap, copy, fill etc...
#include              // contains std::allocator<_Tp> 
#include          // internal functions for construction and destruction of objects
#include      // internal, dealing with initialization of objects
#include             // contains STL vector implementation
#include            // vector and bit vector specialization

#ifndef _GLIBCXX_EXPORT_TEMPLATE
# include 
#endif

#ifdef _GLIBCXX_DEBUG
# include                 // a debug build of stl, extremely useful for debugging
#endif

#endif /* _GLIBCXX_VECTOR */

Ok, what have we learnt? 1) there is a debug build of stl::vector, similarly for other stl containers. This is an extremely useful and most overlooked feature of STL. It has proven very useful to help debug STL related issues. 2) vector and bitvector are specializations of std::vector. But there are some minor details of thse containers that one should be aware of when using them. We'l come back to these 2 containers later. 3) object creation and destruction play important roles in implementation of vector, several header files contain internal implementation of functions dealing with object construction, initialization, and destruction.

Now let's dive into the vector implementation:




#ifndef _VECTOR_H
#define _VECTOR_H 1

#include    // deal with iterators
#include 
#include 

namespace _GLIBCXX_STD           // This the std namespace in gnu c++ implementation
{
  /**
   *  @if maint
   *  See bits/stl_deque.h's _Deque_base for an explanation.
   *  @endif
  */
  template
    struct _Vector_base          // Interesting, there is a base class for std::vector
    {
      typedef typename _Alloc::template rebind<_Tp>::other _Tp_alloc_type;
// _Tp_alloc_type = std::allocator<_Tp>
// For customizing purpose, vector needs to inform the allocator class that it
// deals with _Tp data type. It asks the allocator to intialize with _Tp. 
// It may not be apparent why this is required here, but it'll be clear when we
// deal with std::list, where 
// _Tp_alloc_type = std::allocator >
// i.e. std::list's allocator is actually allocator >
      struct _Vector_impl
      : public _Tp_alloc_type
      {
    _Tp*           _M_start;
    _Tp*           _M_finish;
    _Tp*           _M_end_of_storage;
    _Vector_impl(_Tp_alloc_type const& __a)
    : _Tp_alloc_type(__a), _M_start(0), _M_finish(0), _M_end_of_storage(0)
    { }
      };     _Vector_impl is an internal struct. It inherits from allocator<_Tp>
//Vector base is implemented through _Vector_impl which derives from _Tp_alloc_type
//In this struct, 3 orthoganal data members describe a vector, the start, the finish,
//and the end of the storage. There are 2 paralel concepts here, logical and physical.
//The start variable is the start of vector in both logical and physical sense, this is a
//pointer that points to the start of the vector. The finish variable denotes a logical
//end of a vector. A user is not allowed to access a vector beyond this point. 
//The end of storage variable points to the end of the physical entity, beyond which
//it's un-initialized emptiness.
//With the help of these 3 variables, many bound operations can be performed on a
//vector container.
    public:
      typedef _Alloc allocator_type;

      _Tp_alloc_type&
      _M_get_Tp_allocator()
      { return *static_cast<_Tp_alloc_type*>(&this->_M_impl); }

      const _Tp_alloc_type&
      _M_get_Tp_allocator() const
      { return *static_cast(&this->_M_impl); }

      allocator_type
      get_allocator() const
      { return _M_get_Tp_allocator(); }

      _Vector_base(const allocator_type& __a)
      : _M_impl(__a)
      { }

      _Vector_base(size_t __n, const allocator_type& __a)
      : _M_impl(__a)
      {
    this->_M_impl._M_start = this->_M_allocate(__n);
    this->_M_impl._M_finish = this->_M_impl._M_start;
    this->_M_impl._M_end_of_storage = this->_M_impl._M_start + __n;
      }

      ~_Vector_base()
      { _M_deallocate(this->_M_impl._M_start, this->_M_impl._M_end_of_storage
              - this->_M_impl._M_start); }

    public:
      _Vector_impl _M_impl;

      _Tp*
      _M_allocate(size_t __n)
      { return _M_impl.allocate(__n); }

      void
      _M_deallocate(_Tp* __p, size_t __n)
      {
    if (__p)
      _M_impl.deallocate(__p, __n);
      }
    };
//----------------------------------------------------------------------------------------

// START OF VECTOR IMPLEMENTATION

//----------------------------------------------------------------------------------------

  /**
   *  @brief A standard container which offers fixed time access to
   *  individual elements in any order.
   *
   *  @ingroup Containers
   *  @ingroup Sequences
   *
   *  Meets the requirements of a container, a
   *  reversible container, and a
   *  sequence, including the
   *  optional sequence requirements with the
   *  %exception of @c push_front and @c pop_front.
   *
   *  In some terminology a %vector can be described as a dynamic
   *  C-style array, it offers fast and efficient access to individual
   *  elements in any order and saves the user from worrying about
   *  memory and size allocation.  Subscripting ( @c [] ) access is
   *  also provided as with C-style arrays.
  */
  template >
    class vector : protected _Vector_base<_Tp, _Alloc>
    {
      // Concept requirements.
      typedef typename _Alloc::value_type                _Alloc_value_type;
      __glibcxx_class_requires(_Tp, _SGIAssignableConcept)
      __glibcxx_class_requires2(_Tp, _Alloc_value_type, _SameTypeConcept)

      typedef _Vector_base<_Tp, _Alloc>          _Base;
      typedef vector<_Tp, _Alloc>            vector_type;
      typedef typename _Base::_Tp_alloc_type         _Tp_alloc_type;

    public:
      typedef _Tp                    value_type;
      typedef typename _Tp_alloc_type::pointer           pointer;
      typedef typename _Tp_alloc_type::const_pointer     const_pointer;
      typedef typename _Tp_alloc_type::reference         reference;
      typedef typename _Tp_alloc_type::const_reference   const_reference;
      typedef __gnu_cxx::__normal_iterator iterator;
      typedef __gnu_cxx::__normal_iterator
      const_iterator;
      typedef std::reverse_iterator  const_reverse_iterator;
      typedef std::reverse_iterator        reverse_iterator;
      typedef size_t                     size_type;
      typedef ptrdiff_t                  difference_type;
      typedef _Alloc                                 allocator_type;

    protected:
      /** @if maint
       *  These two functions and three data members are all from the
       *  base class.  They should be pretty self-explanatory, as
       *  %vector uses a simple contiguous allocation scheme.  @endif
       */
       */
      using _Base::_M_allocate;
      using _Base::_M_deallocate;
      using _Base::_M_impl;
      using _Base::_M_get_Tp_allocator;

    public:
      // [23.2.4.1] construct/copy/destroy
      // (assign() and get_allocator() are also listed in this section)
      /**
       *  @brief  Default constructor creates no elements.
       */
      explicit
      vector(const allocator_type& __a = allocator_type())
      : _Base(__a)
      { }

      /**
       *  @brief  Create a %vector with copies of an exemplar element.
       *  @param  n  The number of elements to initially create.
       *  @param  value  An element to copy.
       *
       *  This constructor fills the %vector with @a n copies of @a value.
       */
      explicit
      vector(size_type __n, const value_type& __value = value_type(),
         const allocator_type& __a = allocator_type())
      : _Base(__n, __a)
      {
    std::__uninitialized_fill_n_a(this->_M_impl._M_start, __n, __value,
                      _M_get_Tp_allocator());
    this->_M_impl._M_finish = this->_M_impl._M_start + __n;
      }
      }

      /**
       *  @brief  %Vector copy constructor.
       *  @param  x  A %vector of identical element and allocator types.
       *
       *  The newly-created %vector uses a copy of the allocation
       *  object used by @a x.  All the elements of @a x are copied,
       *  but any extra memory in
       *  @a x (for fast expansion) will not be copied.
       */
      vector(const vector& __x)
      : _Base(__x.size(), __x.get_allocator())
      { this->_M_impl._M_finish =
      std::__uninitialized_copy_a(__x.begin(), __x.end(),
                      this->_M_impl._M_start,
                      _M_get_Tp_allocator());
      }

      /**
       *  @brief  Builds a %vector from a range.
       *  @param  first  An input iterator.
       *  @param  last  An input iterator.
       *
       *  Create a %vector consisting of copies of the elements from
       *  [first,last).
       *
       *  If the iterators are forward, bidirectional, or
       *  random-access, then this will call the elements' copy
       *  constructor N times (where N is distance(first,last)) and do
       *  no memory reallocation.  But if only input iterators are
       *  used, then this will do at most 2N calls to the copy
       *  constructor, and logN memory reallocations.
       *  constructor, and logN memory reallocations.
       */
      template
        vector(_InputIterator __first, _InputIterator __last,
           const allocator_type& __a = allocator_type())
    : _Base(__a)
        {
      // Check whether it's an integral type.  If so, it's not an iterator.
      typedef typename std::__is_integer<_InputIterator>::__type _Integral;
      _M_initialize_dispatch(__first, __last, _Integral());
    }

      /**
       *  The dtor only erases the elements, and note that if the
       *  elements themselves are pointers, the pointed-to memory is
       *  not touched in any way.  Managing the pointer is the user's
       *  responsibilty.
       */
      ~vector()
      { std::_Destroy(this->_M_impl._M_start, this->_M_impl._M_finish,
              _M_get_Tp_allocator());
      }

      /**
       *  @brief  %Vector assignment operator.
       *  @param  x  A %vector of identical element and allocator types.
       *
       *  All the elements of @a x are copied, but any extra memory in
       *  @a x (for fast expansion) will not be copied.  Unlike the
       *  copy constructor, the allocator object is not copied.
       */
      vector&
      operator=(const vector& __x);

      /**
       *  @brief  Assigns a given value to a %vector.
       *  @param  n  Number of elements to be assigned.
       *  @param  val  Value to be assigned.
       *
       *  This function fills a %vector with @a n copies of the given
       *  value.  Note that the assignment completely changes the
       *  %vector and that the resulting %vector's size is the same as
       *  the number of elements assigned.  Old data may be lost.
       */
      void
      assign(size_type __n, const value_type& __val)
      { _M_fill_assign(__n, __val); }

      /**
       *  @brief  Assigns a range to a %vector.
       *  @param  first  An input iterator.
       *  @param  last   An input iterator.
       *
       *  This function fills a %vector with copies of the elements in the
       *  range [first,last).
       *
       *  Note that the assignment completely changes the %vector and
       *  that the resulting %vector's size is the same as the number
       *  of elements assigned.  Old data may be lost.
       */
      template
        void
        assign(_InputIterator __first, _InputIterator __last)
        {
      // Check whether it's an integral type.  If so, it's not an iterator.
      typedef typename std::__is_integer<_InputIterator>::__type _Integral;
      _M_assign_dispatch(__first, __last, _Integral());
    }

      /// Get a copy of the memory allocation object.
      using _Base::get_allocator;

      // iterators
      /**
       *  Returns a read/write iterator that points to the first
       *  element in the %vector.  Iteration is done in ordinary
       *  element order.
       */
      iterator
      begin()
      { return iterator (this->_M_impl._M_start); }

      /**
       *  Returns a read-only (constant) iterator that points to the
       *  first element in the %vector.  Iteration is done in ordinary
       *  element order.
       */
      const_iterator
      begin() const
      { return const_iterator (this->_M_impl._M_start); }

      /**
       *  Returns a read/write iterator that points one past the last
       *  element in the %vector.  Iteration is done in ordinary
       *  element order.
       */
      iterator
      end()
      { return iterator (this->_M_impl._M_finish); }

      /**
       *  Returns a read-only (constant) iterator that points one past
       *  the last element in the %vector.  Iteration is done in
       *  ordinary element order.
       */
      const_iterator
      end() const
      { return const_iterator (this->_M_impl._M_finish); }

      /**
       *  Returns a read/write reverse iterator that points to the
       *  last element in the %vector.  Iteration is done in reverse
       *  element order.
       */
      reverse_iterator
      rbegin()
      { return reverse_iterator(end()); }

      /**
       *  Returns a read-only (constant) reverse iterator that points
       *  to the last element in the %vector.  Iteration is done in
       *  reverse element order.
       */
      const_reverse_iterator
      rbegin() const
      { return const_reverse_iterator(end()); }

      /**
       *  Returns a read/write reverse iterator that points to one
       *  before the first element in the %vector.  Iteration is done
       *  in reverse element order.
       */
      reverse_iterator
      rend()
      { return reverse_iterator(begin()); }

      /**
       *  Returns a read-only (constant) reverse iterator that points
       *  to one before the first element in the %vector.  Iteration
       *  is done in reverse element order.
       */
      const_reverse_iterator
      rend() const
      { return const_reverse_iterator(begin()); }

      // [23.2.4.2] capacity
      /**  Returns the number of elements in the %vector.  */
      size_type
      size() const
      { return size_type(end() - begin()); }

      /**  Returns the size() of the largest possible %vector.  */
      size_type
      max_size() const
      { return size_type(-1) / sizeof(value_type); }

      /**
       *  @brief  Resizes the %vector to the specified number of elements.
       *  @param  new_size  Number of elements the %vector should contain.
       *  @param  x  Data with which new elements should be populated.
       *
       *  This function will %resize the %vector to the specified
       *  number of elements.  If the number is smaller than the
       *  %vector's current size the %vector is truncated, otherwise
       *  the %vector is extended and new elements are populated with
       *  given data.
       */
      void
      resize(size_type __new_size, value_type __x = value_type())
      {
    if (__new_size < size())
      erase(begin() + __new_size, end());
    else
      insert(end(), __new_size - size(), __x);
      }

      /**
       *  Returns the total number of elements that the %vector can
       *  hold before needing to allocate more memory.
       */
      size_type
      capacity() const
      { return size_type(const_iterator(this->_M_impl._M_end_of_storage)
             - begin()); }

      /**
       *  Returns true if the %vector is empty.  (Thus begin() would
       *  equal end().)
       */
      bool
      empty() const
      { return begin() == end(); }

      /**
       *  @brief  Attempt to preallocate enough memory for specified number of
       *          elements.
       *  @param  n  Number of elements required.
       *  @throw  std::length_error  If @a n exceeds @c max_size().
       *
       *  This function attempts to reserve enough memory for the
       *  %vector to hold the specified number of elements.  If the
       *  number requested is more than max_size(), length_error is
       *  thrown.
       *
       *  The advantage of this function is that if optimal code is a
       *  necessity and the user can determine the number of elements
       *  that will be required, the user can reserve the memory in
       *  %advance, and thus prevent a possible reallocation of memory
       *  and copying of %vector data.
       */
      void
      reserve(size_type __n);

      // element access
      /**
       *  @brief  Subscript access to the data contained in the %vector.
       *  @param n The index of the element for which data should be
       *  accessed.
       *  @return  Read/write reference to data.
       *
      *
       *  This operator allows for easy, array-style, data access.
       *  Note that data access with this operator is unchecked and
       *  out_of_range lookups are not defined. (For checked lookups
       *  see at().)
       */
      reference
      operator[](size_type __n)
      { return *(begin() + __n); }

      /**
       *  @brief  Subscript access to the data contained in the %vector.
       *  @param n The index of the element for which data should be
       *  accessed.
       *  @return  Read-only (constant) reference to data.
       *
       *  This operator allows for easy, array-style, data access.
       *  Note that data access with this operator is unchecked and
       *  out_of_range lookups are not defined. (For checked lookups
       *  see at().)
       */
      const_reference
      operator[](size_type __n) const
      { return *(begin() + __n); }

    protected:
      /// @if maint Safety check used only from at().  @endif
      void
      _M_range_check(size_type __n) const
      {
    if (__n >= this->size())
      __throw_out_of_range(__N("vector::_M_range_check"));
      }

    public:
      /**
       *  @brief  Provides access to the data contained in the %vector.
       *  @param n The index of the element for which data should be
       *  accessed.
       *  @return  Read/write reference to data.
       *  @throw  std::out_of_range  If @a n is an invalid index.
       *
       *  This function provides for safer data access.  The parameter
       *  is first checked that it is in the range of the vector.  The
       *  function throws out_of_range if the check fails.
       */
      reference
      at(size_type __n)
      {
    _M_range_check(__n);
    return (*this)[__n];
      }

      /**
       *  @brief  Provides access to the data contained in the %vector.
       *  @param n The index of the element for which data should be
       *  accessed.
       *  @return  Read-only (constant) reference to data.
       *  @throw  std::out_of_range  If @a n is an invalid index.
       *
       *  This function provides for safer data access.  The parameter
       *  is first checked that it is in the range of the vector.  The
       *  function throws out_of_range if the check fails.
       */
      const_reference
      at(size_type __n) const
      {
    _M_range_check(__n);
    return (*this)[__n];
      }

      /**
       *  Returns a read/write reference to the data at the first
       *  element of the %vector.
       */
      reference
      front()
      { return *begin(); }

      /**
       *  Returns a read-only (constant) reference to the data at the first
       *  element of the %vector.
       */
      const_reference
      front() const
      { return *begin(); }

      /**
       *  Returns a read/write reference to the data at the last
       *  element of the %vector.
       */
      reference
      back()
      { return *(end() - 1); }

      /**
       *  Returns a read-only (constant) reference to the data at the
       *  last element of the %vector.
       */
      const_reference
      back() const
      { return *(end() - 1); }

      // _GLIBCXX_RESOLVE_LIB_DEFECTS
      // DR 464. Suggestion for new member functions in standard containers.
      // data access
      /**
       *   Returns a pointer such that [data(), data() + size()) is a valid
       *   range.  For a non-empty %vector, data() == &front().
       */
      pointer
      data()
      { return pointer(this->_M_impl._M_start); }

      const_pointer
      data() const
      { return const_pointer(this->_M_impl._M_start); }

      // [23.2.4.3] modifiers
      /**
       *  @brief  Add data to the end of the %vector.
       *  @param  x  Data to be added.
       *
       *  This is a typical stack operation.  The function creates an
       *  element at the end of the %vector and assigns the given data
       *  to it.  Due to the nature of a %vector this operation can be
       *  done in constant time if the %vector has preallocated space
       *  available.
       */
       */
      void
      push_back(const value_type& __x)
      {
    if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage)
      {
        this->_M_impl.construct(this->_M_impl._M_finish, __x);
        ++this->_M_impl._M_finish;
      }
    else
      _M_insert_aux(end(), __x);
      }

      /**
       *  @brief  Removes last element.
       *
       *  This is a typical stack operation. It shrinks the %vector by one.
       *
       *  Note that no data is returned, and if the last element's
       *  data is needed, it should be retrieved before pop_back() is
       *  called.
       */
      void
      pop_back()
      {
    --this->_M_impl._M_finish;
    this->_M_impl.destroy(this->_M_impl._M_finish);
      }

      /**
       *  @brief  Inserts given value into %vector before specified iterator.
       *  @param  position  An iterator into the %vector.
       *  @param  x  Data to be inserted.
       *  @return  An iterator that points to the inserted data.
       *
       *  This function will insert a copy of the given value before
       *  the specified location.  Note that this kind of operation
       *  could be expensive for a %vector and if it is frequently
       *  used the user should consider using std::list.
       */
      iterator
      insert(iterator __position, const value_type& __x);

      /**
       *  @brief  Inserts a number of copies of given data into the %vector.
       *  @param  position  An iterator into the %vector.
       *  @param  n  Number of elements to be inserted.
       *  @param  x  Data to be inserted.
       *
       *  This function will insert a specified number of copies of
       *  the given data before the location specified by @a position.
       *
       *  Note that this kind of operation could be expensive for a
       *  %vector and if it is frequently used the user should
       *  consider using std::list.
       */
      void
      insert(iterator __position, size_type __n, const value_type& __x)
      { _M_fill_insert(__position, __n, __x); }

      /**
       *  @brief  Inserts a range into the %vector.
       *  @param  position  An iterator into the %vector.
       *  @param  first  An input iterator.
       *  @param  last   An input iterator.
       *
       *  This function will insert copies of the data in the range
       *  [first,last) into the %vector before the location specified
       *  by @a pos.
       *
       *  Note that this kind of operation could be expensive for a
       *  %vector and if it is frequently used the user should
       *  consider using std::list.
       */
      template
        void
        insert(iterator __position, _InputIterator __first,
           _InputIterator __last)
        {
      // Check whether it's an integral type.  If so, it's not an iterator.
      typedef typename std::__is_integer<_InputIterator>::__type _Integral;
      _M_insert_dispatch(__position, __first, __last, _Integral());
    }

      /**
       *  @brief  Remove element at given position.
       *  @param  position  Iterator pointing to element to be erased.
       *  @return  An iterator pointing to the next element (or end()).
       *
       *  This function will erase the element at the given position and thus
       *  shorten the %vector by one.
       *
       *  Note This operation could be expensive and if it is
       *  frequently used the user should consider using std::list.
       *  The user is also cautioned that this function only erases
       *  the element, and that if the element is itself a pointer,
       *  the pointed-to memory is not touched in any way.  Managing
       *  the pointer is the user's responsibilty.
       */
      iterator
      erase(iterator __position);

      /**
       *  @brief  Remove a range of elements.
       *  @param  first  Iterator pointing to the first element to be erased.
       *  @param  last  Iterator pointing to one past the last element to be
       *                erased.
       *  @return  An iterator pointing to the element pointed to by @a last
       *           prior to erasing (or end()).
       *
       *  This function will erase the elements in the range [first,last) and
       *  shorten the %vector accordingly.
       *
       *  Note This operation could be expensive and if it is
       *  frequently used the user should consider using std::list.
       *  The user is also cautioned that this function only erases
       *  the elements, and that if the elements themselves are
       *  pointers, the pointed-to memory is not touched in any way.
       *  Managing the pointer is the user's responsibilty.
       */
      iterator
      erase(iterator __first, iterator __last);

      /**
       *  @brief  Swaps data with another %vector.
       *  @param  x  A %vector of the same element and allocator types.
       *
       *  This exchanges the elements between two vectors in constant time.
       *  (Three pointers, so it should be quite fast.)
       *  Note that the global std::swap() function is specialized such that
       *  std::swap(v1,v2) will feed to this function.
       */
      void
      swap(vector& __x)
      {
    std::swap(this->_M_impl._M_start, __x._M_impl._M_start);
    std::swap(this->_M_impl._M_finish, __x._M_impl._M_finish);
    std::swap(this->_M_impl._M_end_of_storage,
          __x._M_impl._M_end_of_storage);
      }

      /**
       *  Erases all the elements.  Note that this function only erases the
       *  elements, and that if the elements themselves are pointers, the
       *  pointed-to memory is not touched in any way.  Managing the pointer is
       *  the user's responsibilty.
       */
      void
      clear()
      {
    std::_Destroy(this->_M_impl._M_start, this->_M_impl._M_finish,
              _M_get_Tp_allocator());
    this->_M_impl._M_finish = this->_M_impl._M_start;
      }

    protected:
      /**
       *  @if maint
       *  Memory expansion handler.  Uses the member allocation function to
       *  obtain @a n bytes of memory, and then copies [first,last) into it.
       *  @endif
       */
      template
        pointer
        _M_allocate_and_copy(size_type __n,
                 _ForwardIterator __first, _ForwardIterator __last)
        {
      pointer __result = this->_M_allocate(__n);
      try
        {
          std::__uninitialized_copy_a(__first, __last, __result,
                      _M_get_Tp_allocator());
          return __result;
        }
      catch(...)
        {
          _M_deallocate(__result, __n);
          __throw_exception_again;
        }
    }


      // Internal constructor functions follow.

      // Called by the range constructor to implement [23.1.1]/9
      template
        void
        _M_initialize_dispatch(_Integer __n, _Integer __value, __true_type)
        {
      this->_M_impl._M_start = _M_allocate(__n);
      this->_M_impl._M_end_of_storage = this->_M_impl._M_start + __n;
      std::__uninitialized_fill_n_a(this->_M_impl._M_start, __n, __value,
                    _M_get_Tp_allocator());
      this->_M_impl._M_finish = this->_M_impl._M_end_of_storage;
    }

      // Called by the range constructor to implement [23.1.1]/9
      template
        void
        _M_initialize_dispatch(_InputIterator __first, _InputIterator __last,
                   __false_type)
        {
      typedef typename std::iterator_traits<_InputIterator>::
        iterator_category _IterCategory;
      _M_range_initialize(__first, __last, _IterCategory());
    }

      // Called by the second initialize_dispatch above
      template
        void
        _M_range_initialize(_InputIterator __first,
                _InputIterator __last, std::input_iterator_tag)
        {
      for (; __first != __last; ++__first)
        push_back(*__first);
    }

      // Called by the second initialize_dispatch above
      template
        void
        _M_range_initialize(_ForwardIterator __first,
                _ForwardIterator __last, std::forward_iterator_tag)
        {
      const size_type __n = std::distance(__first, __last);
      this->_M_impl._M_start = this->_M_allocate(__n);
      this->_M_impl._M_end_of_storage = this->_M_impl._M_start + __n;
      this->_M_impl._M_finish =
        std::__uninitialized_copy_a(__first, __last,
                    this->_M_impl._M_start,
                    _M_get_Tp_allocator());
    }


      // Internal assign functions follow.  The *_aux functions do the actual
      // assignment work for the range versions.

      // Called by the range assign to implement [23.1.1]/9
      template
        void
        _M_assign_dispatch(_Integer __n, _Integer __val, __true_type)
        {
      _M_fill_assign(static_cast(__n),
             static_cast(__val));
    }

      // Called by the range assign to implement [23.1.1]/9
      template
        void
        _M_assign_dispatch(_InputIterator __first, _InputIterator __last,
               __false_type)
        {
      typedef typename std::iterator_traits<_InputIterator>::
        iterator_category _IterCategory;
      _M_assign_aux(__first, __last, _IterCategory());
    }

      // Called by the second assign_dispatch above
      template
        void
        _M_assign_aux(_InputIterator __first, _InputIterator __last,
              std::input_iterator_tag);

      // Called by the second assign_dispatch above
      template
        void
        _M_assign_aux(_ForwardIterator __first, _ForwardIterator __last,
              std::forward_iterator_tag);

      // Called by assign(n,t), and the range assign when it turns out
      // to be the same thing.
      void
      _M_fill_assign(size_type __n, const value_type& __val);


      // Internal insert functions follow.

      // Called by the range insert to implement [23.1.1]/9
      template
        void
        _M_insert_dispatch(iterator __pos, _Integer __n, _Integer __val,
               __true_type)
        {
      _M_fill_insert(__pos, static_cast(__n),
             static_cast(__val));
    }

      // Called by the range insert to implement [23.1.1]/9
      template
        void
        _M_insert_dispatch(iterator __pos, _InputIterator __first,
               _InputIterator __last, __false_type)
        {
      typedef typename std::iterator_traits<_InputIterator>::
        iterator_category _IterCategory;
      _M_range_insert(__pos, __first, __last, _IterCategory());
    }

      // Called by the second insert_dispatch above
      template
        void
        _M_range_insert(iterator __pos, _InputIterator __first,
            _InputIterator __last, std::input_iterator_tag);

      // Called by the second insert_dispatch above
      template
        void
        _M_range_insert(iterator __pos, _ForwardIterator __first,
            _ForwardIterator __last, std::forward_iterator_tag);

      // Called by insert(p,n,x), and the range insert when it turns out to be
      // the same thing.
      void
      _M_fill_insert(iterator __pos, size_type __n, const value_type& __x);

      // Called by insert(p,x)
      void
      _M_insert_aux(iterator __position, const value_type& __x);
    };


  /**
   *  @brief  Vector equality comparison.
   *  @param  x  A %vector.
   *  @param  y  A %vector of the same type as @a x.
   *  @return  True iff the size and elements of the vectors are equal.
   *
   *  This is an equivalence relation.  It is linear in the size of the
   *  vectors.  Vectors are considered equivalent if their sizes are equal,
   *  and if corresponding elements compare equal.
  */
  template
    inline bool
    operator==(const vector<_Tp, _Alloc>& __x, const vector<_Tp, _Alloc>& __y)
    { return (__x.size() == __y.size()
          && std::equal(__x.begin(), __x.end(), __y.begin())); }

  /**
   *  @brief  Vector ordering relation.
   *  @param  x  A %vector.
   *  @param  y  A %vector of the same type as @a x.
   *  @return  True iff @a x is lexicographically less than @a y.
   *
   *  This is a total ordering relation.  It is linear in the size of the
   *  vectors.  The elements must be comparable with @c <.
   *
   *  See std::lexicographical_compare() for how the determination is made.
  */
  template
    inline bool
    operator<(const vector<_Tp, _Alloc>& __x, const vector<_Tp, _Alloc>& __y)
    { return std::lexicographical_compare(__x.begin(), __x.end(),
                      __y.begin(), __y.end()); }

  /// Based on operator==
  template
    inline bool
    operator!=(const vector<_Tp, _Alloc>& __x, const vector<_Tp, _Alloc>& __y)
    { return !(__x == __y); }

  /// Based on operator<
  template
    inline bool
    operator>(const vector<_Tp, _Alloc>& __x, const vector<_Tp, _Alloc>& __y)
    { return __y < __x; }

  /// Based on operator<
  template
    inline bool
    operator<=(const vector<_Tp, _Alloc>& __x, const vector<_Tp, _Alloc>& __y)
    { return !(__y < __x); }

  /// Based on operator<
  template
    inline bool
    operator>=(const vector<_Tp, _Alloc>& __x, const vector<_Tp, _Alloc>& __y)
    { return !(__x < __y); }

  /// See std::vector::swap().
  template
    inline void
    swap(vector<_Tp, _Alloc>& __x, vector<_Tp, _Alloc>& __y)
    { __x.swap(__y); }
} // namespace std

#endif /* _VECTOR_H */

The Eulerian Path Problem

The Euleraian Problem originated from the Seven Bridges of Konigsberg http://en.wikipedia.org/wiki/Seven_Bridges_of_K%C3%B6nigsberg. The abstraction of the problem is to construct an Eulerian Path http://en.wikipedia.org/wiki/Eulerian_path

Here I implemented an algoirthm based on http://tuxv.blogspot.com/2007/05/eulerian-path.html. The problem with the original solution is that the author seems confused with directed and undirected graph. Essentially, the Eulerian Path solves a problem of a directed graph (although the appearance of the graph may be undirected). The clever discovery here is that an edge can be walked only once thus making it directed when exploring a solution.

Quote:
Constructing Eulerian paths and cycles

Consider a graph known to have all edges in the same component and at most two vertices of odd degree. We can construct an Eulerian path (not a cycle) out of this graph by using Fleury's algorithm, which dates to 1883. We start with a vertex of odd degree—if the graph has none, then start with any vertex. At each step we move across an edge whose deletion does not result in more than one connected component, unless we have no choice, then we delete that edge. At the end of the algorithm there are no edges left, and the sequence of edges we moved across forms an Eulerian cycle if the graph has no vertices of odd degree or an Eulerian path if there are two vertices of odd degree.

The definition of a Hamiltonian path is very similar (a Hamiltonian path visits every vertex exactly once, while an Eulerian path visits every edge exactly once). In practice, however, it is much more difficult to construct a Hamiltonian path or determine whether a graph is Hamiltonian, as that problem is NP-complete

Following is the C++ source code:



#include < iostream>
#include < vector>
#include < list>

using namespace std;
#define N 4
/*
int bridges[N][N] = {
    {0, 1, 0, 1},
    {1, 0, 1, 1},
    {0, 1, 0, 1},
    {0, 0, 0, 0},
};
*/
int bridges[N][N] = {
    {0, 1, 0, 1},
    {1, 0, 1, 1},
    {0, 1, 0, 1},
    {0, 0, 1, 0},
};

#define graph bridges

list< int> path;

void walk(int src){
    for(int i = 0; i < N; i ++)
        if(graph[src][i]){
            graph[src][i] = 0;
            walk(i);
        }else{
            if(graph[i][src]){
                graph[i][src] = 0;
                walk(i);
            }
        }
    path.push_back(src);
}

int main(){
    walk(0);

    while(!path.empty()){
        cout << path.back() << endl;
        path.pop_back();
    }
}

The graph (ajacency-matrix represention) is directed because it's not symmetric (the transpose is not equal to the original matrix). The problem solver has the freedom to choose a direction for an edge. When designing the ajacency-matrix, it's important to observe the rule that one must start from a node with odd degree. Suppose the node is numbered s, and it has 2k+1 degree, then it must have k+1 outgoing edges and k incoming edges.

CDPATH in bash

What does CDPATH do? It provides a shortcut to bash to change directory. For example, there is
a working directory called
/home/user/abc/efg/hik/lmn/123/456/
under which there are dir1, dir2, dir4..etc

To change directory, one has to type cd /home/user/abc/efg/hik/lmn/123/456/dir1 to get to dir1. This can be aggravating as one can imagine. CDPATH is a envrionment setting for bash such that one can set it up as export CDPATH=/home/user/abc/efg/hik/lmn/123/456/, next time one just needs to type cd dir1 to get to dir1, so on and so forth.

However, this setting can interfere with GNU make system. Or any non-interactive script that has cd command inside. It's recommended that one 'unset CDPATH' before 'make' or long hours of head scratching can follow.

Meditation, The Art of Exploitation

About Me

Thursday, June 28, 2007

C++ how is STL set/map iterator implemented?

Tuesday, June 26, 2007

libxml high performance xml parser

C++: The standard template library: vector

Monday, June 18, 2007

The Eulerian Path Problem

Wednesday, June 13, 2007

CDPATH in bash