Meditation, The Art of Exploitation

Thinking? At last I have discovered it--thought; this alone is inseparable from me. I am, I exist--that is certain. But for how long? For as long as I am thinking. For it could be, that were I totally to cease from thinking, I should totally cease to exist....I am, then, in the strict sense only a thing that thinks.

Wednesday, February 13, 2008

Compilers Part 7, theoretical considerations of the parsing process

There are a few key concepts and data structures commonly utilized in a parser: the abstract syntax tree (AST)/parse tree/syntax tree, type and value stacks, action execution.

It's not initially obvious, but a parser is essentially a stack machine operating on an abstract syntax tree with post-order action execution. The AST graph (constructed by observing predefined hierarchical shift/reduce rules embedded in grammars during parsing) is traversed in a breath-first order and the action associated with each node executed post-order. Incidentally, a stack is a perfect data structure when breath-first traversing a graph, another candidate is deque. A deque allows stack operations at its either end and thus a stack can be considered a half-closed (usually head closed) deque. By definition stack operates on a first in first out (FIFO) order while deque allows either FIFO or first in last out (FILO).

A parse tree construction process explains why it's mandatory that predefined grammars assisted with associativity and precedence order rules cannot have any unresolved syntactical ambiguity or such that only one parse tree can be constructed according to the grammatical rules as in the case of a GLR parser. A syntactical ambiguity translates into an undefined parse tree and does not constitute a regular grammar (in fact a relaxed form of regular grammar because LR parsers allows the production rule expressed as TERMINAL NONTERMINAL) that's commonly implemented in most computer programming language.

The parsing process becomes alive once it's understood, 1) AST or parse tree is constructed based on the hierarchical (defined by hierarchical structure and shift/reduce rules) grammars; 2) the parse tree is then traversed breath-first and terminals/non-terminals(products) evaluated post order; 3) result of evaluation is pushed back into the traverse stack, equivalent to tree leaf folding upward (try to visualize the content of an internal vertex replaced by the result of evaluation its children vertexes according to user defined actions).

Interestingly hierarchical (which implies a tree) grammars can also be used generatively to produce strings of arbitrary length that conforms to production rules. Although the generation process can produce any string, a parser effectively defines the generation process bound by input to produce one and only one syntax tree. This is thus a deterministic finite state machine when considering only the start state (empty syntax tree) and end state (a syntax tree generated based on a predefined input). Not to be confused with the state transition process within a GLR parser during parsing, a GLR parser can be a nondeterministic finite state machine.

References
1. http://en.wikipedia.org/wiki/Formal_grammar