Meditation, The Art of Exploitation

Thinking? At last I have discovered it--thought; this alone is inseparable from me. I am, I exist--that is certain. But for how long? For as long as I am thinking. For it could be, that were I totally to cease from thinking, I should totally cease to exist....I am, then, in the strict sense only a thing that thinks.

Monday, February 11, 2008

Compilers Part 6, DSEL interpreter in a tcp/ip server

We are finally ready to port our interpreter into a tcp/ip server. DSEL stands for domain specific embedded language. In our project, we created a simple sql like scripting language to directly manipulate data stored in memory.

There are a few technicalities apart from lexer/parser to successfully build a tcp/ip server. We use the TCP_server implementation from STLplus library, this implementation provides us a non-blocking poll based tcp server implementation. Combined with forking or boost thread, it can be easily adapted in a multiplexing tcp/ip server.

We introduce a new output string stream as our global (or per thread if necessary) output buffer because we no longer have yyout (stdout) to display result from interpreter. The result of interpreting user input is put into the output string stream and sent back to client. We simply rename the 'main' method in the bison grammar file to 'parse' and invoke it from a serverlet thread as the server processing function, it simply calls 'yyparse' to parse client input. Before the server calls parse, it calls set_yybuffer(TCP_connection & conn) to set up flex in memory string buffer using techniques discussed in the previous entry on compiler construction.

In set_yybuffer, we send the server output to client and accepts client input and creates a flex string buffer from it. Every time client input is exhausted, yywrap is called and current flex buffer is released and set_yybuffer is called again to read input from client. Because we use poll based non-blocking tcp IO provided by STLplus, we don't have to worry much about synchronizing between client and server. The library provides convenient interface to operate IO based on the socket status.

As usual, the complete listing including its makefile is posted here:


#ifndef MAP_H
#define MAP_H
#include < string>
#include < iostream>
#include < sstream>
#include < iomanip>
#include < map>

#include < boost/lambda/lambda.hpp>
typedef std::map< std::string, std::string> map_t;

#include "tcp.hpp"
extern int set_yybuffer(TCP_connection & );
extern int parse();
// a value type that describe the value of a symbol table entry
// Essentially the symbol table entry data structure
struct value{
unsigned char type; // the first letter of corresponding union type
union {
int ival;
float fval;
double dval;
char * sval;
map_t * mval;
} ivalue;
};

// The symbol table data structure
// Symbol table entry is keyed by the symbol string value
typedef std::map< std::string, value> symtab_t;

extern symtab_t symtab;
extern std::ostringstream os;
extern int yyerror(char *);
#endif

%{
#include "map.tab.h"
#include "map.h"

extern "C"{
#include < unistd.h>
#include < fcntl.h>
#include < time.h>
#include < string.h>
}
TCP_connection conn_yy;
std::string data;
std::ostringstream os;

bool report_err;
int lineno;
int tokenpos;
%}
D [0-9]
N {D}+
L [a-zA-Z]
A [a-zA-Z0-9]
ID ({L}{A}*)

%option yylineno
%%

select { tokenpos += yyleng; return SELECT; }
insert { tokenpos += yyleng; return INSERT; }
into { tokenpos += yyleng; return INTO; }
from { tokenpos += yyleng; return FROM; }
create { tokenpos += yyleng; return CREATE; }
table { tokenpos += yyleng; return TABLE; }
list { tokenpos += yyleng; return LIST; }
where { tokenpos += yyleng; return WHERE; }
key { tokenpos += yyleng; return KEY; }
value { tokenpos += yyleng; return VALUE; }
quit { tokenpos += yyleng; return QUIT; }

${ID} { tokenpos += yyleng; yylval.text = strdup(yytext+1); return OBJECT; }
{ID} { tokenpos += yyleng; yylval.text = strdup(yytext); return TEXT; }
[ \t] { tokenpos += yyleng; /* ignore white space */ }
. { tokenpos += yyleng; return yytext[0]; }
\n { return '\n'; }

%%

int yyerror(char * s){
extern int yylineno;
os << yylineno << " : " << s << " at \n" << data;
for(int i = 0; i < tokenpos; i ++) os << ' ';
os << "^\n";
}

YY_BUFFER_STATE cur_buffer;

int set_yybuffer(TCP_connection & conn){
conn_yy = conn;
tokenpos = 0;
int ntry = 0; // time out after 120 seconds

while(!conn.send_ready(100000)) ;
std::string send_data = os.str();
std::cout << "send to client: " << send_data;
if(!conn.send(send_data)) return 1;
os.str("");

while(!conn.receive_ready(100000) && ntry ++ < 1200) ;
data = "";
if(ntry >= 1200 || !conn.receive(data)) return 1;
os << data;

std::cout << "analyze: " << data;
cur_buffer = yy_scan_string(data.c_str());

return 0;
}

int yywrap(){
yy_delete_buffer(cur_buffer);
return set_yybuffer(conn_yy);
}

%{
extern "C"{
#include < stdio.h>
#define YYDEBUG 1
}
extern int yyerror(char *);
extern int yylex();

#include "map.h"

symtab_t symtab;
bool where_by_key = false;
bool where_by_value = false;
std::string tablename;
std::string where;
%}

%union{
char * text;
}

%token INSERT SELECT INTO TEXT OBJECT FROM CREATE TABLE LIST
%token WHERE KEY VALUE
%token QUIT
%%

statements: statements statement
| statement
;
statement: insert_stmt opt_semicolon '\n'
| select_stmt opt_semicolon '\n'
| create_stmt opt_semicolon '\n'
| assign_stmt opt_semicolon '\n'
| list_stmt opt_semicolon '\n'
| QUIT opt_semicolon '\n'
| '\n'
| error '\n' { yyclearin; yyerrok; }
;
opt_semicolon:
| ';'
;
assign_stmt:
OBJECT '=' TEXT
{
// string variable assignment
symtab_t::iterator it = symtab.find($1);
if(it != symtab.end() && it->second.type == 's') // Symbol found and type is correct
it->second.ivalue.sval = $3;
else{ // New symbol, add to symbol table
value v;
v.ivalue.sval = $3;
v.type = 's';
symtab[$1] = v;
}
}
;
create_stmt:
CREATE TABLE OBJECT
{
// Create a new dictionary
std::string symbol = $3;
symtab_t::iterator it = symtab.find(symbol);
if(it != symtab.end() && it->second.type == 'm'){ // Symbol found and type is correct
os << "symbol: " << symbol << " already exists\n";
}else{ // New symbol, create new map(table), add to symbol table
value v;
v.ivalue.mval = new(map_t);
v.type = 'm';
symtab[symbol] = v;
}
}
;
insert_stmt:
INSERT INTO OBJECT '(' TEXT ',' TEXT ')'
{
// insert key, value pair into an existing dictionary
symtab_t::const_iterator it = symtab.find($3);
if(it != symtab.end() && it->second.type == 'm'){ // Symbol found and type is correct
(*(it->second.ivalue.mval))[std::string($5)] = std::string($7);
}else
os << "unknown symbol: " << $3 << " create first\n";
}
;
select_stmt: simple_select_stmt
{
// go through all key, value pair of a dictionary
symtab_t::const_iterator it = symtab.find(tablename);
if(it != symtab.end() && it->second.type == 'm'){
map_t::const_iterator mit = it->second.ivalue.mval->begin();
for(; mit != it->second.ivalue.mval->end(); ++ mit)
os << "key = " << mit->first << ' '
<< "value = " << mit->second << '\n';
}else
os << "invalid object\n";

}
| simple_select_stmt opt_where_stmt
{
// go through all key, value pair of a dictionary
// based on where criteria, search by key or value
symtab_t::const_iterator it = symtab.find(tablename);
if(it != symtab.end() && it->second.type == 'm'){
map_t::const_iterator mit = it->second.ivalue.mval->begin();
for(; mit != it->second.ivalue.mval->end(); ++ mit)
if( (where_by_key && mit->first == where) ||
(where_by_value && mit->second == where) ||
(!where_by_key && !where_by_value) )
os << "key = " << mit->first << ' '
<< "value = " << mit->second << '\n';
}else
os << "invalid object\n";

where_by_key = where_by_value = false;

}
;
simple_select_stmt:
SELECT '*' FROM OBJECT { tablename = $4; }
;
opt_where_stmt: WHERE KEY '=' TEXT { where_by_key = true; where = $4; }
| WHERE VALUE '=' TEXT { where_by_value = true; where = $4; }
;
list_stmt:
LIST
{
// Dump the entire symbol table
// For dictionaries, dump all key, value pairs as well
//
// Iterate through the symbol table
symtab_t::const_iterator it = symtab.begin();
for(; it != symtab.end(); ++it){
os << "symbol: " << it->first << ' ' << it->second.type << '\n';
switch(it->second.type){
case 's': os << "value = " << it->second.ivalue.sval << '\n';
break;
case 'm': {
// iterate through the dictionary
map_t::const_iterator mit = it->second.ivalue.mval->begin();
for(; mit != it->second.ivalue.mval->end(); ++ mit)
os << "key = " << mit->first << ' '
<< "value = " << mit->second << '\n';
}
break;
default:
os << "Unknown data type\n";
break;
}
}
}
;
%%

int parse(){
extern int yydebug;
yydebug = 0;
yyparse();
}

#include < vector>
#include < iostream>
#include < algorithm>
#include < functional>

#include "map.h"

#include "tcp.hpp"
#include "fileio.hpp"
#include "debug.hpp"
using namespace std;

int main (int argc, char* argv[])
{
DEBUG_TRACE;
if (argc != 2)
ferr << "usage: " << argv[0] << " " << endl;
else
{
// create a client connection
// the address is specified by command argument 1 and the port
// specified by argument 2. Use a timeout of 10s.
TCP_server main_server((unsigned short)atoi(argv[1]), 5);
// test to see if the connection completed OK within the timeout
if (!main_server.initialised())
{
ferr << "server failed to initialise" << endl;
return -1;
}
if (main_server.error())
{
ferr << "server initialisation failed with error " << main_server.error() << endl;
return -1;
}
while(!main_server.connection_ready(1000000)) ;
TCP_connection server = main_server.connection();

std::cout << "Got a new connection.\n";
if(!set_yybuffer(server))
parse();
}
}




A few notes, the program misses stringent memory manage, there are memory leaks associated with strdup usage (fix is simple, add free in grammar action code); server does not finalize without proper QUIT action code; turn the server into a multiplexing server and add proper synchronization on shared objects. These are important for a real world application but they are not the focus of our project.

We have successfully create a DSEL interpreter living inside a tcp/ip server utlizing powerful C++ STL library. This is a good starting point to implement more robust and useful server side DSEL interpreters.