Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm always shocked to find major projects that use parser generators. A hand-written recursive descent parser is easier to write, read, understand, and modify. It's almost trivial to translate a language specification's grammar productions into a recursive descent parser.

So looking at the list I'm dumbfounded that CPython and Ruby use generators.



You won't find me defending the use of yacc, but in general hand written parsers are far more error prone.


My strategy when faced with doing a DSL of some kind is to write an accepter in bison or similar and knock the bugs out of the grammar and basic readability issues. Then go recursive descent with a table-driven precedence-climbing parser for expressions when ready for the real deal.


This is, I think, a very good strategy for many reasons. Not only is it a cheap way to work out tthe design of the language and a way to verify the correctness of your implementation. It's also a good fallback solution: if your (more expensive, expressive) implementation is not ready in time, you can fall back to integrating the generated implementation to meet an important deadline or whatnot.


It's harder to ensure that the grammar doesn't have unintended ambiguities when using a parser generator. That doesn't matter as much when "just" developing another implementation of an existing fully specced language, but for cases like the examples you cite that doesn't exist.

That issue is the primary reason why postgres continues to use bison. The SQL standard introduced potential parsing ambiguities frequently and leaves a large portion of the things necessary for a functioning RDBMS unspecced.


> It's almost trivial to translate a language specification's grammar productions into a recursive descent parser.

It's also potentially quite slow.


In my experience (with both hand-written and generators, eg Antler), that's not the case.

Hand-written recursive descent are simple to write, debug and maintain and extend, while with generators I always had to fight the tool at some point, and that always ended in frustration.


We've just had different experiences then. It's funny you mention ANTLR because it was in particular very slow last time I tried it (just a couple years ago).

As for fighting the tool and simplicity, sure, that's beside my point.


Some grammars can't be parsed recursive descent. At least if I remember parsing class from decades ago they can't.


There are techniques that help parsers parse a far larger class of grammars than their underlying model permits. Some of these are:

- turning it into a state machine (required to deal with nested comments and C++ template syntax)

- building the symbol table while parsing and querying for disambiguation

- treating expressions as a stream of atoms and using a specialised precedence parser (probably required to deal with operator overloading)

Parser generators usually employ these as well.


Not true in practice. C++ is one of the hardest practical languages to parse. And it can be parsed using a recursive descent scanner less parser.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: