[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sweet-expressions are not homoiconic



John David Stone originally stated on 23 May 2013:
> Whitespace characters
> don't look like grouping symbols, as parentheses, brackets, braces, or
> oriented quotation marks do, because they don't have appropriate shapes and
> don't come in pairs.  Moreover, they don't visibly nest, so it is unnatural
> to use them to represent recursively defined syntactic structures.

I think it's obvious I don't agree, but it might be useful to recap how we got here,
and why I think indentation sensitivity is a GREAT tool for representing Lisp expressions.

Lisps already have a perfectly serviceable visible pair of symbols
for grouping, namely, parentheses.  The problem is with their overuse.
Since EVERYTHING is grouped with parentheses, it can be hard for humans
to tell when you're ending one thing versus another.  E.G.,  when there are
6 closing parentheses, it's hard to tell that it should be 7.
People can visually match 2, or maybe 3 pairs, but not 10 or 12.

Lisp is called "lots of irritating superflous parentheses" for a *reason*.
Most software developers today will *immediately* reject any
language with such poor readability.  Even Lisp's creator,
John McCarthy, did not intend for s-expressions to be used directly (!).
While some people don't like Python's indentation-based syntax, Python
is WAY more popular than Scheme or Common Lisp.  In short,
Lisp's syntax greatly inhibits its use where it might be used otherwise.

Some obvious "solutions" do NOT work well enough:
* We can add another character pair.  R6RS did this by adding [...].
  You could even add {...} if you wanted to.
  But in practice I don't think this works well at all; it adds confusion not help.
  I think part of the problem is that (), [], and {} aren't visually distinct enough.
  It's worth noting that R7RS-small drops [...] as a requirement, so clearly
  there's no groundswell of support for [...] as a synonym for (...).
* We could define a fixed syntax that is tailored to fixed language semantics.
  That is the "usual way" this problem is solved in other languages,
  and when the languages are used in their anticipated domain, this works well.
  But Lisps are often used for symbol manipulation, where symbols may actually
  be for a domain-specific language and where you can easily create
  new meanings (via macros).  So this "usual solution" doesn't well work for Lisps
  (without giving up some of the reasons for using a Lisp in the first place).

The great thing about the indentation-sensitive approach is that, if carefully defined,
it is NOT tied to any particular semantics.  Yet it can still represent
complexly-nested structures *AND* it is clearly visually distinct from parentheses.
What's more, Lisp developers ALREADY use indentation layout to show nesting,
and many other languages (including the widely-used Python) already use
indentation, so it's not such a big change for many.  It's also easy to define
it so it retains backwards compatibility. Sure, it's a change, but not a massive one.

Quick aside: The GNU folks have
long stated that guile was "the official extension language for the
GNU operating system", but relatively few GNU programs use it as an
extension language. I believe one reason is because
guile is saddled with Scheme's default syntax; a more readable version
of guile would be far more compelling.  Guile already supports SRFI-105
(hooray!); I believe adding sweet-expressions would make guile and
other Scheme implementations far MORE compelling.


John David Stone:
> At this point, so many markers, special conventions, and multilayer
> exceptions ...

SRFI-110 only has three marker constructs: \\, $, and <*...*>.
You can consider abbreviation+whitespace a fourth construct, if you like.
There are no special conventions and no exceptions.
You can teach the whole thing (including all of SRFI-105) in less than an hour.

These markers were developed based on real-world experience with the notation.
I kicked off the readable group around 2006; we've been working for ~7 years
to find the smallest set of markers that produces a *useful* and *readable* notation.
But "0 markers" is not a reasonable goal for something practical.  Even wisp,
which trades *away* readability of code to get a simpler notation, has markers.  
We're hardly the first to observe this; reStructuredText was developed as
a reaction to StructuredText, and one issue was *specifically* that users had
to indent very long blocks, and ended up with something like markers.

There may be a better set of markers (symbols and semantics), and as you
can see in the mailing list, we've had many discussions and debates.
But this set seems to resolve the problems of indentation-sensitive syntax,
in a way that's pleasant to read and use.  If you have a better solution, please post.

> have been added to the proposed syntax that it is quite
> implausible to claim that sweet-expressions are homoiconic.  In the general
> case, reconstructing the underlying data structure from the
> sweet-expression that represents it requires the mental application of a
> non-obvious algorithm of considerable intricacy.

I can do it.  Alan can do it. Other participants here can do it too.
Therefore, it's homoiconic.

It's not intricate; the algorithm takes maybe an hour to learn.
That's LESS than most languages, and since you can
amortize that time over legions of hours reading and
writing code, it's worth doing.

The BNF looks more intricate only because
I wanted to be very rigorous in its definition.  That has advantages:
We can be sure it has desirable properties (e.g., it is LL(1)),
and can be much more confident that different implementations will
do the same thing.  That latter point means that learning the algorithm ONCE
can pay dividends across different implementations, increasing the
likelihood that it's worth learning.

--- David A. Wheeler