Meditation, The Art of Exploitation

Thinking? At last I have discovered it--thought; this alone is inseparable from me. I am, I exist--that is certain. But for how long? For as long as I am thinking. For it could be, that were I totally to cease from thinking, I should totally cease to exist....I am, then, in the strict sense only a thing that thinks.

Tuesday, January 29, 2008

Compilers Part 2, lex & yacc with C++, types of external linkage

Lex and Yacc were traditionally used with C, more importantly lots of default functions provided by the lex/yacc (or flex/bison) library all have C linkage, meaning their function names are not mangled, notably yyparse, yywrap, yyerror.

yyparse generated by yacc internally calls yylex generated by lex. Therefore it's important that both yyparse and yylex use same linkage, either C or C++. linkage is determined in the definition section.

For example in this yacc definition of grammar.y:
extern "C"{
extern int yyerror(char *);
extern int yylex(void);

yyerror has C linkage, this parser uses the default yyerror implementation provided by the yacc library (-ly). yylex has the linkage the compiler used to compile the generated source code, in the case (gcc -c the result yylex will have a C linkage, evidenced by (nm|grep yylex) 'U yylex'

On the other hand, if g++ is used to compile (g++ -c -x c++, the result yylex symbol in the grammar object file will have C++ linkage, its name will be mangled as shown by 'nm': 'U _Z5yylexv'. In both cases, 'U' means undefined symbol because it has external linkage and will be provided by another compilation unit. The mangled name can be inspected by 'nm -C|grep lex' which yields 'U yylex()'

Ok, enough introduction of library, external linkage and nm tricks. The point is when using lex&yacc with C++, it's very important to pay attention to function names and declare proper linkage.

If we intend to use C++ to compile/link our grammar.y example with a grammar.l lex file, the lex file needs to have the following definition:
extern "C"{
//extern int yylex(void);
#include ""

Note that yylex is specifically commented out to make it clear that it will use the compiler default linkage (C for gcc or C++ for g++).

Here is a makefile that can be used to compile lex/yacc with C++ code embedded directly.

%: %.l %.y
$(LEX) -t $@.l > $@.c
if [[ -e $@.y ]] ; then \
$(YACC) -d --verbose --debug $@.y; \
$(CXX) $(CXXFLAGS) -c -x c++ $; \
$(CXX) $(CXXFLAGS) -c -x c++ $@.c; \
$(CXX) $@.o $ -o $@ -ly -lfl -lm ; \
else \
$(CXX) $(CXXFLAGS) -o $@ $@.c -lfl -lm ; \
@if [[ -e ]] ; then rm $ ; fi
@if [[ -e ]] ; then rm $ ; fi
#@-rm $@.c
rm *.o

If you examine the lex generated source code, you will see something like this:

/* Default declaration of generated scanner - a define so the user can
* easily add parameters.
#ifndef YY_DECL
#define YY_DECL_IS_OURS 1

extern int yylex (void);

#define YY_DECL int yylex (void)
#endif /* !YY_DECL */
/** The main scanner function which does all the work.

The extra 'extern' storage specifier for int yylex() is redundant and confusing. According to C and C++ linkage rule, an 'extern' function with a visible definition in the same file will result in external linkage. Since yylex is later defined in the lex generated source code, yylex has external linkage and internal definition (lacking a better term). In the following example, test has external linkage and internal definition; test1 has external linkage and external definition; test2 causes compilation failure. In this nm output, 'T' means the test has internal definition and its definition is in the text section of the object file; 'U' means test1 is undefined (external definition, defined in another translation/compilation unit/object file).

00000000 T test
U test1

#include < errno.h>
extern int errno;

int errno;

extern int test();
extern int test1();
extern int test2();

int test(){
errno = 10;

static int test2(){