Looking at using a parser generator to produce code to process arithmetic expressions. The intention is (was) to perhaps extend it to evaluate if-then type expressions and return jump points.

Three important factors:

  • Runtime code needs to have a suitable permissive licence.
  • Generated code must be compact and compile to a minimal binary.
  • Should be simple to integrate in to the existing tokenising code.

Must also be reentrant and not use any global variables.

Bisonc++

  • A relatively small package at 238K for the deb (install size 880K).
  • Compatible with Bison grammar for the most part.
  • Generates standalone code that doesn’t require any supporting libraries.
  • Easy to integrate with the existing tokeniser.
  • Generated code is subject to to GPL licencing terms.
  • A simple calculator example generates a stripped ~125K binary.

Bison (C++ version)

  • Package size is a little larger than BisonC++ at 329K, although install size is less (535K).
  • As the reference point for grammar compatibility, no further comment required.
  • Although Bison++ is GPL, there is an exception for generated code. So in theory, an alternative licence could be used.
  • Slightly more involved to integrate with the existing codebase, but nothing too taxing.
  • Again, standalone code that doesn’t require any runtime libraries.
  • Stripped test calculator is ~90K.

Antlr4

  • BSD licence (probably applies to generated code).
  • Requires a 347K runtime package (~1.3M installed).
  • To generate the parser code requires some 25M or more - The price one pays for using Java.
  • Even stripped to the minimum and compiled without any debugging, the basic test binary weighs in at some 850K
  • On that basis alone, Antlr4 can be dismissed.

  • Lemon

  • Public domain and part of the SQLite project.
  • 303K package (434K installed).
  • Simple calculator test comes out with a (stripped) 19K binary.
  • C only interface, but it claims to be thread safe.

Out of all the parser code generators I’ve looked at, Lemon is at the top of the list. The grannar syntax is not that much different to Bison. Being C only does place a few restrictions on use.

Hand crafted expression parser

Herbet Schildt presents a simple math expression parser. Even although it claims to be C++ code, it clings to plain old C methodologies. Even so, it compiles down to just ~7K.
Extending it to handle if-then type expressions is not hugely complex, but certainly more work than just adding a few lines to a Bison grammar file (would still need tokeniser extensions).

Further work

Need to compare actual execution speeds and compare to a hand written parser - Opinion is the hand written version will be much faster at the expense of maintainability and ease of extending. But speed is not an overriding criteria as this is running in user space and the bottleneck will be in how fast other parts of the system can consume data. That said, don’t need any complex functionality from the parser. For the arithmetic side, just return a single number. If-then expression evaluation only needs to return true/false, and the execution routines can decide the final jump point.


<
Blog Archive
Archive of all previous blog posts
>
Next Post
When is a standard not a standard