.:: C++ Lexer Toolkit Library (LexerTk)

C++ Lexer Toolkit Library

www.partow.net

.: Home :. .: Links :. .: Search :. .: Contact :.

Main Menu

About Programming Projects Miscellaneous

Topics

C++ Mathematical Expression Library String Toolkit Library General Purpose Hash Function Algorithms C++ Bitmap Library C++ TCP Proxy Server C++ Bloom Filter Library C++ DSV Filter Library C++ Vector Expression Template Library C++ Lexer Library C++ Date And Time Parsing Utilities C++ Logging Toolkit C++ Summation Toolkit C++ Wildcard Matching Library Callbacks In C++ C++ Makefile Template Primitive Polynomials Wykobi Computational Geometry Library Schifra Reed Solomon Error Correcting Code Library Win32 Registry Activity Monitor

www.partow.net Menu Logo - Copyright Arash Partow

Description

The C++ Lexer Toolkit Library (LexerTk) is a simple to use, easy to integrate and extremely fast lexicographical generator - lexer. The tokens generated by the lexer can be used as input to a parser such as "ExprTk".

Capabilities

Operators: +, -, /, *, %, ^, >>, <<, >, <, >=, <=, =, :=
Symbols: Format [a..z|A..Z]{a..z|A..Z|0..9|_}
Numbers: Integers, Real (scientific notation 1.234e+56)
Strings: 'A string 1234 !@#$?'
Brackets: (), [], {}
Token processors and validators
Single header implementation, no building required. No external dependencies.

The LexerTk will decompose an input character stream into a token stream compatible for the following BNF:

expression ::= term { +|- term }
term       ::= (symbol | factor) {operator symbol | factor}
factor     ::= symbol | ( '(' {-} expression ')' )
symbol     ::= number | gensymb | string
gensymb    ::= alphabet {alphabet | digit}
string     ::= "'" {alphabet | digit | operator } "'"
operator   ::= * | / | % | ^ | < | > | <= | >= | << | >> | !=
alphabet   ::= a | b | .. | z | A | B | .. | Z
digit      ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 |  7 | 8 | 9
sign       ::= + | -
edef       ::= e | E
decimal    ::= {digit} (digit [.] | [.] digit) {digit}
exponent   ::= edef [sign] digit {digit}
real       ::= [sign] decimal [exponent]
integer    ::= [sign] {digit}
number     ::= real | integer

The C++ Lexer Library License

Free use of the C++ Lexer Library is permitted under the guidelines and in accordance with the MIT License.

Compatibility

The C++ Lexer Library implementation is fully compatible with the following C++ compilers:

GNU Compiler Collection (3.5+)
Clang/LLVM (1.1+)
Microsoft Visual Studio C++ Compiler (7.1+)
Intel® C++ Compiler (8.x+)
AMD Optimizing C++ Compiler (1.2+)
Nvidia C++ Compiler (19.x+)
PGI C++ (10.x+)
IBM XL C/C++ (9.x+)
C++ Builder (XE4+)

Download

The C++ Lexer Library Source Code and Examples

Simple Example 1

The following example will take a string representing an expression, tokenize it using the generator and then proceed to dump each of the tokens to stdout, providing information related to the token such as its type, value and position within the expression.

int main()
{
   std::string expression = "(1.1 + ( x - y * z / 1.23E+4 ) % w";

   lexertk::generator generator;

   if (!generator.process(expression))
   {
      std::cout << "Failed to lex: " << expression << std::endl;
      return 1;
   }

   lexertk::helper::dump(generator);

   return 0;
}

Simple example 1 expected output:

Token[00] @ 000       (  -->  '('
Token[01] @ 001  NUMBER  -->  '1.1'
Token[02] @ 005       +  -->  '+'
Token[03] @ 007       (  -->  '('
Token[04] @ 009  SYMBOL  -->  'x'
Token[05] @ 011       -  -->  '-'
Token[06] @ 013  SYMBOL  -->  'y'
Token[07] @ 015       *  -->  '*'
Token[08] @ 017  SYMBOL  -->  'z'
Token[09] @ 019       /  -->  '/'
Token[10] @ 021  NUMBER  -->  '1.23E+4'
Token[11] @ 029       )  -->  ')'
Token[12] @ 031       %  -->  '%'
Token[13] @ 033  SYMBOL  -->  'w'

Simple Example 2

The following example will take a string representing an expression, tokenize it using the generator. The tokens are then analyized in consequtive pairs. When a pair of tokens are determined to have an implied multiplication between themselves, a token of type multiplication is inserted between them. For example the expression '2x+1' becomes '2*x+1'.

int main()
{
   std::string expression = "1.23 (2.2variable1 / 3.3variable2) 4.56e+12 + variable3";

   lexertk::generator generator;

   if (!generator.process(expression))
   {
      std::cout << "Failed to lex: " << expression << std::endl;
      return 1;
   }

   lexertk::helper::commutative_inserter ci;
   ci.process(generator);

   lexertk::helper::dump(generator);

   return 0;
}

Simple example 2 expected output:

Token[00] @ 000  NUMBER  -->  '1.23'
Token[01] @ 005       *  -->  '*'
Token[02] @ 005       (  -->  '('
Token[03] @ 006  NUMBER  -->  '2.2'
Token[04] @ 009       *  -->  '*'
Token[05] @ 009  SYMBOL  -->  'variable1'
Token[06] @ 019       /  -->  '/'
Token[07] @ 021  NUMBER  -->  '3.3'
Token[08] @ 024       *  -->  '*'
Token[09] @ 024  SYMBOL  -->  'variable2'
Token[10] @ 033       )  -->  ')'
Token[11] @ 035       *  -->  '*'
Token[12] @ 035  NUMBER  -->  '4.56e+12'
Token[13] @ 044       +  -->  '+'
Token[14] @ 046  SYMBOL  -->  'variable3'

Simple Example 3

The following example will take a string representing an expression, tokenize it using the generator, then attempt to verify that the ordering of the brackets within the expression is correct.

int main()
{
   std::string expression = "{aa+(bb-[cc*dd]+ee)-ff}";

   lexertk::generator generator;

   if (!generator.process(expression))
   {
      std::cout << "Failed to lex: " << expression << std::endl;
      return 1;
   }

   lexertk::helper::bracket_checker bc;
   bc.process(generator);

   if (!bc)
   {
      std::cout << "Failed Bracket Checker!" << std::endl;
      return 1;
   }

   lexertk::helper::dump(generator);

   return 0;
}

Simple example 3 expected output:

Token[00] @ 000       {  -->  '{'
Token[01] @ 001  SYMBOL  -->  'aa'
Token[02] @ 003       +  -->  '+'
Token[03] @ 004       (  -->  '('
Token[04] @ 005  SYMBOL  -->  'bb'
Token[05] @ 007       -  -->  '-'
Token[06] @ 008       [  -->  '['
Token[07] @ 009  SYMBOL  -->  'cc'
Token[08] @ 011       *  -->  '*'
Token[09] @ 012  SYMBOL  -->  'dd'
Token[10] @ 014       ]  -->  ']'
Token[11] @ 015       +  -->  '+'
Token[12] @ 016  SYMBOL  -->  'ee'
Token[13] @ 018       )  -->  ')'
Token[14] @ 019       -  -->  '-'
Token[15] @ 020  SYMBOL  -->  'ff'
Token[16] @ 022       }  -->  '}'