Write a recursive descent parser generator

Parse trees are somewhat different in form from expression trees, but the purpose is the same. They have some nice technology in them and can generate fast parsers.

Computers can only directly use instructions expressed in very simple machine language. While ANTLR did improve greatly what is accepted, I still found myself faced with conflicts, or grammars which simply did not parse the way I intended. At first this may not sound too bad, but it quickly complicates the parsing part.

This is definitely a very useful feature. The " " is read. The Java compiler, of course, will reject the program if it contains such an error.

Recursive descent parser generator

In BNF, a choice between alternatives is represented by the symbol " ", which is read "or". But on the side, we can also run arbitrary analysis on a parser for error message generation, recovery, syntactic completion, or more incrementality This example also shows that a factor can be either a number or an expression in parentheses.

Recursive descent parser

Interestingly, an equivalent system was developed independently at about the same time by linguist Noam Chomsky to describe the grammar of natural language. The real power comes from the fact that BNF rules can be recursive.

Rely on the recursion to handle the details. For the grammars of most programming languages, LR is the sweet spot between parsing expressivity and amenability to static analysis. This is especially true of domain specific languages where it is very convenient to vary tokens based on context.

For example, if you are doing a C compiler you will also want to support inline assembly, which has an entirely different syntax.

To apply recursive descent parsing, we need a subroutine for each rule in the grammar. If you have such a parser generator of course: Dynamic lexer tokens are also problematic to support.

In a subroutine that reads and evaluates expressions, this repetition is handled by a while loop. In a recursive descent parser, every rule of the BNF grammar is the model for a subroutine. It is also a very intuitive step as it follows the shape of the AST quite closely.

I will follow this same pattern in every case.


Higher level languages must be translated into machine language. On a few occasions I was never able to convince the generator of what I wanted and had to alter the syntax of the language to accommodate it. There is no big magic for that part yet, but the heuristic of indentation works quite well: The grammar must satisfy a certain property.

But it turns out that a trivial extension to the parser can also solve your case. Although we will look at only a simple example, I hope it will be enough to convince you that compilers can in fact be written and understood by mortals and to give you some idea of how that can be done. Note also that parentheses can be used for grouping.

Lexing and Context One key aspect that bothers me with many of the tools is the requirement to have a distinct first lexing phase. Also as of today only incrementality and error message generation are part of upstream version of Menhir, but the rest should come soon.

Obviously, we can describe very complex structures in this way. However, BNF does express the basic structure of the language, and it plays a central role in the design of compilers. End of Chapter 9. This is not completely free however, sometimes the grammar needs to be reengineered to carry the relevant information.

The first is completion of the parsetree: This goes beyond simple lexing changes. In practice I find a lot to be desired and end up fighting with the tool more than using it. But you would have to do that anyway with a handwritten parser and here the parser generator can help you Indeed, absolutely no work from the grammar writer has been required so far: And a symbol that is an actual part of the language that is being described is enclosed in quotes.Recursive descent is the simplest way to build a parser, and doesn’t require using complex parser generator tools like Yacc, Bison or ANTLR.

Why I don’t use a Parser Generator

All you need is straightforward hand-written code. Don’t be fooled by its simplicity, though. Until version of Parse::RecDescent, parser modules built with Precompile were dependent on Parse::RecDescent. Future Parse::RecDescent releases with different internal implementations would break pre-existing precompiled parsers.

Recursive descent parser generator is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page. Write a recursive descent parser generator that takes a description of a grammar as input and outputs the source code for a parser in the same language as the generator.

Another fun way to write a recursive descent parser is to abstract and parametrize the recursive descent algorithm. Best done in dynamic languages. I wrote an abstract recursive descent parser in JS which accepts an array of terminal (regexp) and non-terminal definitions (array of terminal/non-terminal names + action callback), and.

Although predictive parsers are widely used, and are frequently chosen if writing a parser by hand, programmers often prefer to use a table-based parser produced by a parser generator [citation needed], either for an LL ANTLR – a recursive descent parser generator. This will teach you how a recursive descent parser works, but it is completely impractical to write a full programming language parser by hand.

You should look into some tools to generate the code for you - if you are determined to write a classical recursive descent parser (TinyPG, Coco/R, Irony).

Write a recursive descent parser generator
Rated 4/5 based on 40 review