2010-09-24

Grammar for parsing lists with optional trailing commas (elements)

This is a standard problem when parsing programming languages. Many languages allow an optional trailing comma in constructs, to allow programmers to add them in order to not forget them when adding new list elements.

Consider an anonymous object declaration in C# 3.0:

var a = new { X=1, X=2 }

however it seems asymmetric, you may want to always add a member AND a comma:

var a = new { X=1, X=2, }

This is allowed and also for instance in C++ enums. This creates the problem of parsing this with LL(1) grammar rules. Doesn't seem that hard, however it took me a while.

Here a solution in CoCo/R ATG:

AnonymousObjectInitializer
=
"{"
[ MemberDeclaratorList ]
"}"
.

MemberDeclaratorList
=
MemberDeclarator [ "," [ MemberDeclaratorList ] ]
.

Read more!