[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sweet-expressions are not homoiconic

This page is part of the web mail archives of SRFI 110 from before July 7th, 2015. The new archives for SRFI 110 contain all messages, not just those from before July 7th, 2015.

Alexey Radul <axch@xxxxxxx> wrote:

> I don't know whether anyone else on this list uses Emacs' Paredit mode
> http://www.emacswiki.org/emacs/ParEdit to edit their S-expressions,

I haven't used Paredit mode, sorry.

But welcome to the discussion, thanks for joining us!!!

> ... The key feature of s-expressions that enables the existence of Paredit
> mode (besides their extraordinary simplicity) is that every
> grammatical construct down to the token level begins and ends on a
> distinct character from every other.  Therefore, there is always at
> most one non-terminal in the parse tree of my code that begins at the
> cursor ("point", in Emacs-speak), and at most one that ends there. ...
> I haven't read the spec of sweet-expressions very carefully, but it
> seems that they lack this crucial property: The expression
>   factorial{n - 1}
> begins on the same character as its subexpression factorial begins, and
> ends on the same character as its subexpression {n - 1} ends.

You're correct, sweet-expressions do *not* have this property.
In "factorial{n - 1}", the closing "}" actually closes TWO expressions,
because that is really (factorial {n - 1}) which is really (factorial (- n 1)).

I suspect that in most cases you could arbitrarily pick one as the "real" one, e.g.,
perhaps the closing "}" matches the leftmost edge (disambiguating the case).
Not being a Paredit mode user, I wouldn't know if that's enough, but in *that*
case there may be a simple solution.

More fundamentally, the same thing happens with dedents.  An line
(e.g., an empty line) may close an arbitrary number of levels. E.G.:
! bb
! ! cc
! ! ! dd ; Next line is blank.

The ending blank line closes a *set* of expressions; the above is (aa (bb (cc dd))).

I don't see how the "Paredit property" could *possibly* be met by typical
indentation-sensitive syntaxes, because every one I'm aware of allows
multiple dedents in one line.  Forcing things otherwise would be annoying.

I'm not currently convinced that the "Paredit property" is important, but whether
it is or not, sweet-expressions certainly do *not* have this property, and I don't
see how they could tweaked to have it.

> I do not grok the scholarship around parsing well enough to know
> whether this property is equivalent to being LL(1) (I expect not),

It's LL(1).  The proof: I've completely defined SRFI-105 using ANTLR with
"options { k = 1; }", which forces ANTLR to be an LL(1) parser,
and factorial{n - 1} is perfectly valid in SRFI-105.

In fact, the entire formal BNF grammar of sweet-expressions is LL(1).
One caveat: the formal grammar assumes a front-end separately processes
indentation and generates INDENT and DEDENT tokens.  This is how
indentation-sensitive languages are usually implemented
(Python, for example, does this).  The Scheme reference implementation
intentionally mirrors the BNF, making it easy to see that it corresponds
to the formal BNF, and it's a traditional recursive-descent parser.

It's easy to write a BNF that "looks okay" but actually has problems.
That's why I used a specialty tool (ANTLR) to ensure that
the grammar is unambiguous and easy to parse.

> ... In any case,
> I submit that sweet-expressions would become a much more powerful and
> effective notation if they were to obey the Paredit Property -- or,
> more broadly, if someone were to implement Paredit mode for them.

Advanced editing modes are definitely desirable!!
I think there's a reluctance to spend a lot of time creating editing modes
when the notation itself is still in flux.  I certainly hope to see cool modes
in the future.

I *do* believe that sweet-expressions
can be well-supported by an editing mode.
Heck, people manage to support Python (which is indentation-sensitive)
and C++ (which is notoriously hard to parse) with fancy editing modes.
Compared to them, sweet-expressions should be a snap :-).

> I bring this up here because it would be a shame if a small
> modification to sweet-expressions were to turn out to be the
> difference between Paredit being implementable for them vs not, and if
> the SRFI were to be finalized in the wrong state.

If there's a small modification that would make sweet-expressions
better, I *really* want to hear about it now!!

But I can't see how any indentation-based language could possibly meet this
property and still be pleasant to use.  As far as the factorial{...} example
goes, that's based on SRFI-105, which is already frozen.
But if you have specific ideas on this, I'd love to hear it.

> P.P.S. I was motivated to write this note in part because of the
> recurrent complaint about 10 closing parens being hard to distinguish
> from 12.  In the presence of Paredit mode this is simply a
> non-problem.  Paredit maintains the invariant that one's s-expressions
> are always well-formed (e.g., typing '(' inserts "()"; typing ')'
> inserts nothing but moves the next ")" to the cursor and steps over
> it; etc).

That *invariant* I completely buy in to.
I have a long-standing habit of always typing the closing paren if
I'm using an editor that won't do it for me.  I recommend that in non-Lisps too.
This eliminates the problem of having unclosed items,
so yes, I agree that this is a useful thing to do.

But that's not the problem I meant.

>  So I literally neither know nor care whether there are 10
> parens at the end of something or 12 -- it's always the right number.

Sorry, I didn't make myself clear.

Closing *all* expressions is easy enough, I agree.  Even more importantly,
tools can do that trivially for you.  The problem is when the code
closes some but not all expressions, and you then have to read it.
In that case, it's not obvious which datum matches which level.
An editor can help you when you're *typing*, but it's far less helpful when *reading*
the code. Some folks have experimented with color, so you can see which
expressions are at which level, but that brings its own problems.

Contrast this with indentation levels.  A *non-computer* person can easily
see which indentation level matches which.  If we can make notations that
clear, then we can focus on the actual problems we're trying to solve
instead of problems with the notation.  Well, that's the hope anyway.

It's not impossible to read s-expressions, of course. I presume everyone
here can read Lisp code well (I can't imagine anyone else reading a
SRFI discussion).  My goal is to make things *better*.

--- David A. Wheeler