expression parsing

Posted by tetley at 2020-03-28


Programmers who have used Libpcap will surely notice that Libpcap allows to set filtering rules when obtaining data packets. For example, port 80 means only getting HTTP traffic. But how does this work? Let's illustrate with a simple example. Before the example, we need to introduce the flex and bison that will be used later.

port 80 flex bison


Flex and bison are mainly tools specially designed for compiler and interpreter programmers, but they are also very useful in other fields, so they also attract the attention of many non compiler programmers. As long as any program can find a specific pattern in the input, or use command language as input, it is suitable to use flex and bison.

Flex and bison are GNU versions of lex and yacc. Flex is used for lexical analysis and bison for grammatical analysis. In short, lexical analysis divides input into meaningful chunks, and grammatical analysis determines how these chunks are related. For example, C language:

The lexical analyzer decomposes the code into symbols, a, equal sign, B, plus sign, C and semicolon; the parser determines that B + C is an expression, and the value of the expression is assigned to a.

An example of and or expression parsing

Having learned about the functions of flex and bison, let's take a look at an example. This example completes the and or operation that a string is equal to or contains. For example, we can calculate the expression AB = = AB & & AB ^ ^ a | AB = = a.

ab == ab && ab ^^ a || ab == a ./expr_parser "ab == ab && ab ^^ a || ab == a" Expression: (((((ab) == (ab))) && (((ab) ^^ (a)))) || (((ab) == (a)))) Result: 1

As for lexical, we define operations corresponding to, or, equal to, including, parenthesis and value. The format of the flex source file is as follows:

%{ #include <string.h> #include "parser.h" %} %% \n /* ignore end of line */ [ \t]+ /* ignore whitespace */ "&&" return T_AND; "||" return T_OR; "==" return T_EQ; "^^" return T_CT; "(" return '('; ")" return ')'; [a-zA-Z][a-zA-Z0-9]* yylval.string_literal = strdup(yytext); return T_FIELD; %%

Parsing transforms a series of tokens into a parsing tree, and Bison's rule is basically BNF. The BNF rules in the bison source file are as follows:

string_filter: T_FIELD T_EQ T_FIELD { $$ = make_op_node(EQ); $$->left_child = make_left_node($1); $$->right_child = make_right_node($3); } | T_FIELD T_CT T_FIELD { $$ = make_op_node(CT); $$->left_child = make_left_node($1); $$->right_child = make_right_node($3); } ;

From the above code, we can see that we save the parsed syntax in a tree structure, and only need to traverse the whole tree to print the entire expression.

void print_expression(node_t *root) { static const char *ops[] = {"&&", "||", "==", "^^"}; if (!root) return; printf("%s", "("); print_expression(root->left_child); switch(root->type) { case OP: printf(" %s ", ops[root->value.op]); break; case LEFT: printf("(%s", root->value.string_literal); break; case RIGHT: printf("%s)", root->value.string_literal); break; default: break; } print_expression(root->right_child); printf("%s", ")"); }

To calculate the true or false of the whole expression, only recursion is needed. It is a very basic tree operation.

int evaluate_expression(node_t *root) { if (!root) return 1; switch (root->type) { case LEFT: case RIGHT: return 1; case OP: break; } switch (root->value.op) { case EQ: return eval_eq_node(root); case CT: return eval_ct_node(root); case AND: return (evaluate_expression(root->left_child) && evaluate_expression(root->right_child)); case OR: return (evaluate_expression(root->left_child) || evaluate_expression(root->right_child)); } return 0; }

The complete source code is at

Of course, this code is far from the syntax supported by Libpcap, but it can be used for simple input and output filtering with a little change. If you need to support new vocabulary and syntax, you can add it yourself.