[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sweet-expressions are not homoiconic

This page is part of the web mail archives of SRFI 110 from before July 7th, 2015. The new archives for SRFI 110 contain all messages, not just those from before July 7th, 2015.



I don't know whether anyone else on this list uses Emacs' Paredit mode
http://www.emacswiki.org/emacs/ParEdit to edit their S-expressions,
but my own experience is that it makes S-expressions *vastly* easier
to edit than any other syntax (whether indentation-sensitive or
otherwise) that I have ever experienced (note that I have not
experienced sweet-expressions).

The key feature of s-expressions that enables the existence of Paredit
mode (besides their extraordinary simplicity) is that every
grammatical construct down to the token level begins and ends on a
distinct character from every other.  Therefore, there is always at
most one non-terminal in the parse tree of my code that begins at the
cursor ("point", in Emacs-speak), and at most one that ends there.  So
commands like "delete the piece of parse tree that starts where my
cursor is" make sense.  Paredit mode defines and provides key bindings
for a large selection of such operations -- and editing at the parse
tree level is much faster and more fun than the word-stream or
line-stream level.

I haven't read the spec of sweet-expressions very carefully, but it
seems that they lack this crucial property: The expression
  factorial{n - 1}
begins on the same character as its subexpression factorial begins, and
ends on the same character as its subexpression {n - 1} ends.

I do not grok the scholarship around parsing well enough to know
whether this property is equivalent to being LL(1) (I expect not), or
whether it corresponds to any other standard definition.  In any case,
I submit that sweet-expressions would become a much more powerful and
effective notation if they were to obey the Paredit Property -- or,
more broadly, if someone were to implement Paredit mode for them.

I bring this up here because it would be a shame if a small
modification to sweet-expressions were to turn out to be the
difference between Paredit being implementable for them vs not, and if
the SRFI were to be finalized in the wrong state.

Best,
~Alexey

P.S. I don't have a clear idea of how to implement Paredit mode for a
notation that lacks what I have named the Paredit Property, but I
don't want to say that it can't be done.  Perhaps there could be some
phantom characters presented by the editor for the purpose of
disambiguating different components of the parse tree?  Or some sort
of sub-character navigation through the buffer?  It seems that
experimentation would be needed to figure out whether such a strategy
can work.

P.P.S. I was motivated to write this note in part because of the
recurrent complaint about 10 closing parens being hard to distinguish
from 12.  In the presence of Paredit mode this is simply a
non-problem.  Paredit maintains the invariant that one's s-expressions
are always well-formed (e.g., typing '(' inserts "()"; typing ')'
inserts nothing but moves the next ")" to the cursor and steps over
it; etc).  So I literally neither know nor care whether there are 10
parens at the end of something or 12 -- it's always the right number.

On Sun, May 26, 2013 at 8:00 PM, David A. Wheeler <dwheeler@xxxxxxxxxxxx> wrote:
> John David Stone originally stated on 23 May 2013:
>> Whitespace characters
>> don't look like grouping symbols, as parentheses, brackets, braces, or
>> oriented quotation marks do, because they don't have appropriate shapes and
>> don't come in pairs.  Moreover, they don't visibly nest, so it is unnatural
>> to use them to represent recursively defined syntactic structures.
>
> I think it's obvious I don't agree, but it might be useful to recap how we got here,
> and why I think indentation sensitivity is a GREAT tool for representing Lisp expressions.
>
> Lisps already have a perfectly serviceable visible pair of symbols
> for grouping, namely, parentheses.  The problem is with their overuse.
> Since EVERYTHING is grouped with parentheses, it can be hard for humans
> to tell when you're ending one thing versus another.  E.G.,  when there are
> 6 closing parentheses, it's hard to tell that it should be 7.
> People can visually match 2, or maybe 3 pairs, but not 10 or 12.
>
> Lisp is called "lots of irritating superflous parentheses" for a *reason*.
> Most software developers today will *immediately* reject any
> language with such poor readability.  Even Lisp's creator,
> John McCarthy, did not intend for s-expressions to be used directly (!).
> While some people don't like Python's indentation-based syntax, Python
> is WAY more popular than Scheme or Common Lisp.  In short,
> Lisp's syntax greatly inhibits its use where it might be used otherwise.
>
> Some obvious "solutions" do NOT work well enough:
> * We can add another character pair.  R6RS did this by adding [...].
>   You could even add {...} if you wanted to.
>   But in practice I don't think this works well at all; it adds confusion not help.
>   I think part of the problem is that (), [], and {} aren't visually distinct enough.
>   It's worth noting that R7RS-small drops [...] as a requirement, so clearly
>   there's no groundswell of support for [...] as a synonym for (...).
> * We could define a fixed syntax that is tailored to fixed language semantics.
>   That is the "usual way" this problem is solved in other languages,
>   and when the languages are used in their anticipated domain, this works well.
>   But Lisps are often used for symbol manipulation, where symbols may actually
>   be for a domain-specific language and where you can easily create
>   new meanings (via macros).  So this "usual solution" doesn't well work for Lisps
>   (without giving up some of the reasons for using a Lisp in the first place).
>
> The great thing about the indentation-sensitive approach is that, if carefully defined,
> it is NOT tied to any particular semantics.  Yet it can still represent
> complexly-nested structures *AND* it is clearly visually distinct from parentheses.
> What's more, Lisp developers ALREADY use indentation layout to show nesting,
> and many other languages (including the widely-used Python) already use
> indentation, so it's not such a big change for many.  It's also easy to define
> it so it retains backwards compatibility. Sure, it's a change, but not a massive one.
>
> Quick aside: The GNU folks have
> long stated that guile was "the official extension language for the
> GNU operating system", but relatively few GNU programs use it as an
> extension language. I believe one reason is because
> guile is saddled with Scheme's default syntax; a more readable version
> of guile would be far more compelling.  Guile already supports SRFI-105
> (hooray!); I believe adding sweet-expressions would make guile and
> other Scheme implementations far MORE compelling.
>
>
> John David Stone:
>> At this point, so many markers, special conventions, and multilayer
>> exceptions ...
>
> SRFI-110 only has three marker constructs: \\, $, and <*...*>.
> You can consider abbreviation+whitespace a fourth construct, if you like.
> There are no special conventions and no exceptions.
> You can teach the whole thing (including all of SRFI-105) in less than an hour.
>
> These markers were developed based on real-world experience with the notation.
> I kicked off the readable group around 2006; we've been working for ~7 years
> to find the smallest set of markers that produces a *useful* and *readable* notation.
> But "0 markers" is not a reasonable goal for something practical.  Even wisp,
> which trades *away* readability of code to get a simpler notation, has markers.
> We're hardly the first to observe this; reStructuredText was developed as
> a reaction to StructuredText, and one issue was *specifically* that users had
> to indent very long blocks, and ended up with something like markers.
>
> There may be a better set of markers (symbols and semantics), and as you
> can see in the mailing list, we've had many discussions and debates.
> But this set seems to resolve the problems of indentation-sensitive syntax,
> in a way that's pleasant to read and use.  If you have a better solution, please post.
>
>> have been added to the proposed syntax that it is quite
>> implausible to claim that sweet-expressions are homoiconic.  In the general
>> case, reconstructing the underlying data structure from the
>> sweet-expression that represents it requires the mental application of a
>> non-obvious algorithm of considerable intricacy.
>
> I can do it.  Alan can do it. Other participants here can do it too.
> Therefore, it's homoiconic.
>
> It's not intricate; the algorithm takes maybe an hour to learn.
> That's LESS than most languages, and since you can
> amortize that time over legions of hours reading and
> writing code, it's worth doing.
>
> The BNF looks more intricate only because
> I wanted to be very rigorous in its definition.  That has advantages:
> We can be sure it has desirable properties (e.g., it is LL(1)),
> and can be much more confident that different implementations will
> do the same thing.  That latter point means that learning the algorithm ONCE
> can pay dividends across different implementations, increasing the
> likelihood that it's worth learning.
>
> --- David A. Wheeler
>