[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: More comments, and the ANTLR code is too complex

This page is part of the web mail archives of SRFI 110 from before July 7th, 2015. The new archives for SRFI 110 contain all messages, not just those from before July 7th, 2015.

To: "mhw" <mhw@xxxxxxxxxx>, "srfi-110" <srfi-110@xxxxxxxxxxxxxxxxx>
Subject: Re: More comments, and the ANTLR code is too complex
From: "David A. Wheeler" <dwheeler@xxxxxxxxxxxx>
Date: Sat, 27 Jul 2013 00:27:50 -0400 (EDT)
Delivered-to: srfi-110@xxxxxxxxxxxxxxxxx
In-reply-to: <87mwqscrj6.fsf@xxxxxxxxx>
Reply-to: dwheeler@xxxxxxxxxxxx

On Fri, 14 Jun 2013 17:26:53 -0400, Mark H Weaver <mhw@xxxxxxxxxx> wrote:
> However, _if_ turns out that a non-LL(1) grammar would be easier to
> understand, then I think that's what should be used in the actual specification.

A reasonable premise!  Unfortunately - and this may be a limitation on *my* part -
I've so far been unable to find a non-LL(1) grammar that's much easier to
understand than its corresponding LL(1) grammar for this language.

I really *do* want to make you happy, or at least happier, if I can.
Below is a little bit about how I'm trying to do so.

> If you disagree, then consider this: if you were reading the
> specification of a traditional infix language, which of the following
> grammars would you prefer to see when you were _learning_ about the language: ...

I completely agree with you that the LALR(1) form
for infix operators is simpler than LL(1); I've written both.
But sweet-expressions have no infix operators, at least in the traditional sense.

Usually an LL(1) grammar is complicated because of the need for
left factoring and left recursion removal (the Wikipedia article has a nice summary:
http://en.wikipedia.org/wiki/LL_parser ).  The giveaway of such problems
is that many productions will match epsilon ("empty").
However, I've checked, and the current sweet-expression LL(1) grammar
does not have that property; only one key BNF production (post-period)
can match an empty sequence at all.

Since there's no obvious set of transformations to create LL(1)
that can be reversed, I have not found a way to
transform the LL(1) productions directly into a simpler LALR(1) form.

That said... I agree with you that the starting LL(1) grammar was
overly complex.  So, I'm trying to simplify the grammar in LL(1) form,
while continuing to look for ways to transform it into a
non-LL(1) form if that would make it much easier to understand.

I've already taken the following simplifying steps:
- Split up the it_expr rule into a much larger set of smaller rules.
- Pulled the initial indent rule into its own production,
  which greatly simplifies the overall t_expr rule.
- I've replaced "hspace*" with "hs", where "hs : hspace*;".
  Since this happens all over, it has the effect of making rules shorter.

My hope is that by implementing a number of simplifying steps
to make the current LL(1) clearer, we'll meet the goal of having
an easier-to-understand specification.

If anyone has any other specific suggestions, please speak up!

--- David A. Wheeler

References:
- Re: More comments, and the ANTLR code is too complex
  - From: Mark H Weaver

Prev by Date: datum comments of sweet-expressions - now implemented in Scheme
Next by Date: Proposed grammar change: forbid lines with >1 n-expr that end with "."
Previous by thread: Re: More comments, and the ANTLR code is too complex
Next by thread: Re: More comments, and the ANTLR code is too complex
Index(es):
- Date
- Thread