Title

Sweet-expressions (t-expressions)

Authors

Alan Manuel K. Gloria

Status

This SRFI is currently in “draft” status. To see an explanation of each status that a SRFI can hold, see here. To provide input on this SRFI, please mail to <srfi minus 110 at srfi dot schemers dot org>. See instructions here to subscribe to the list. You can access previous messages via the archive of the mailing list.

This SRFI contains all the required sections, including an abstract, rationale, specification, and reference implementation. It also includes a longer design rationale.

Abstract

This SRFI describes a set of syntax extensions for Scheme, called sweet-expressions (t-expressions), that has the same descriptive power as s-expressions but is designed to be easier for humans to read. The sweet-expression syntax enables the use of syntactically-meaningful indentation to group expressions (similar to Python), and it builds on the infix and traditional function notation defined in SRFI-105 (curly-infix-expressions). Unlike nearly all past efforts to improve s-expression readability, sweet-expressions are general (the notation is independent from any underlying semantic) and homoiconic (the underlying data structure is clear from the syntax). This notation was developed by the “Readable Lisp S-expressions Project” and can be used for both programs and data.

Sweet-expressions can be considered a set of additional abbreviations, just as 'x already abbreviates (quote x). Sweet-expressions and traditionally formatted s-expressions can be freely mixed; this provides backwards compatibility, simplifies transition, and enables developers to maximize readability. Here is an example of a sweet-expression and its equivalent s-expression (note that a sweet-expression reader would accept either format):

sweet-expression

s-expression

define fibfast(n)   ; Typical function notation
  if {n < 2}        ; Indentation, infix {...}
     n              ; Single expr = no new list
     fibup n 2 1 0  ; Simple function calls

(define (fibfast n)
  (if (< n 2)
      n
      (fibup n 2 1 0)))

Related SRFIs
Rationale
Tutorial
Specification
Examples
Design Rationale
Reference implementation
References
Acknowledgments
Copyright

Related SRFIs

SRFI-49 (Indentation-sensitive syntax) (superceded by this SRFI), SRFI-105 (Curly-infix-expressions) (incorporated by this SRFI), SRFI-22 (Running Scheme Scripts on Unix) (some interactions), SRFI-30 (Nested Multi-line comments) (some interactions), and SRFI-62 (S-expression comments) (some interactions)

Rationale

Many software developers find Lisp s-expression notation inconvenient and unpleasant to read. In fact, the large number of parentheses required by traditional Lisp s-expression syntax is the butt of many jokes in the software development community. The Jargon File says that Lisp is “mythically from ‘Lots of Irritating Superfluous Parentheses’”. Linus Torvalds commented about some parentheses-rich C code, “don’t ask me about the extraneous parenthesis. I bet some LISP programmer felt alone and decided to make it a bit more homey.” Larry Wall, the creator of Perl, says that, “Lisp has all the visual appeal of oatmeal with fingernail clippings mixed in. (Other than that, it’s quite a nice language.)”. Shriram Krishnamurthi says, “Racket [(a Scheme implementation)] has an excellent language design, a great implementation, a superb programming environment, and terrific tools. Mainstream adoption will, however, always be curtailed by the syntax. Racket could benefit from [reducing] the layers of parenthetical adipose that [needlessly] engird it.”

Even Lisp advocate Paul Graham says, regarding Lisp syntax, “A more serious problem [in Lisp] is the diffuseness of prefix notation... We can get rid of (or make optional) a lot of parentheses by making indentation significant. That’s how programmers read code anyway: when indentation says one thing and delimiters say another, we go by the indentation. Treating indentation as significant would eliminate this common source of bugs as well as making programs shorter. Sometimes infix syntax is easier to read. This is especially true for math expressions. I’ve used Lisp my whole programming life and I still don’t find prefix math expressions natural... I don’t think we should be religiously opposed to introducing syntax into Lisp, as long as it translates in a well-understood way into underlying s-expressions. There is already a good deal of syntax in Lisp. It’s not necessarily bad to introduce more, as long as no one is forced to use it.”

It has often been said that the parentheses “just disappear” after experience. But as bhurt notes, “I’m always somewhat amazed by the claim that the parens ‘just disappear’, as if this is a good thing. Bugs live in the difference between the code in your head and the code on the screen - and having the parens in the wrong place causes bugs. And autoindenting isn’t the answer - I don’t want the indenting to follow the parens, I want the parens to follow the indenting. The indenting I can see, and can see is correct.”

Many new syntaxes have been invented for various Lisp dialects, including McCarthy’s original M-expression notation for Lisp. However, nearly all of these past notations fail to be general (i.e., the notation is independent of an underlying semantic) or homoiconic (i.e., the underlying data structure is clear from the syntax). We believe a Lisp-based notation needs to be general and homoiconic. For example, Lisp-based languages can trivially create new semantic constructs (e.g., with macros) or be used to process other constructs; a Lisp notation that is not general will typically be unable to immediately use those new constructs. Thus, notations that are not general will always lag behind and lack the “full” power of s-expressions.

Recently, using indentation as the sole grouping construct of a language has become popular (in particular with the advent of the Python programming language). This approach solves the problem of indentation going out of sync with the native grouping construct of the language, and exploits the fact that most programmers indent larger programs and expect reasonable indentation by others. Unfortunately, the Python syntax uses special constructs for the various semantic constructs of the language, and the syntaxes of file input and interactive input differ slightly (so cutting-and-pasting of code from a file may be interpreted differently by its REPL).

SRFI-49 defined a promising indentation-sensitive syntax for Scheme. Unfortunately, SRFI-49 had some awkward usage issues, and by itself it lacks support for infix notation (e.g., {a + b}) and prefix formats (e.g., f(x)) that SRFI-105 provides. Sweet-expressions build on and refine SRFI-49 by addressing these issues. Real programs by different authors have been written using sweet-expressions, demonstrating that sweet-expressions are a practical notation. See the design rationale for a detailed discussion on how and why it is designed this way.

Sweet-expressions are general and homoiconic, and thus can be easily used with other constructs such as quasiquoting and macros. In short, if a capability can be accessed using s-expressions, then they can be accessed using sweet-expressions. Unlike Python, the notation is exactly the same in a REPL and a file, so people can switch between a REPL and files without issues. Fundamentally, sweet-expressions define a few additional abbreviations for s-expressions, in much the same way that 'x is an abbreviation for (quote x).

Tutorial

This section provides a basic tutorial on sweet-expressions, which should also make the specification below easier to understand.

Basics

“Sweet-expressions” (aka “t-expressions”) build on neoteric-expressions (aka n-expressions) as defined in SRFI-105. N-expressions are a simple extension of traditional s-expression notation, so valid n-expressions include numbers, strings surrounded by double-quotes, symbols, and lists (whitespace-separated n-expressions surrounded by parentheses). N-expressions add support for infix expressions surrounded by curly braces (aka curly-infix lists), so {a + b} maps to (+ a b). There is no precedence, but you can use braces in braces, e.g., {a + b + {x * y}} maps to (+ a b (* x y)). A curly-infix list with two elements {e1 e2} maps to (e1 e2), and a one-element curly-infix list {e} maps to just that element e. In addition, f(...) maps to (f ...), and f{...} with non-whitespace content maps to (f {...}). For more details, see SRFI-105.

Sweet-expressions add the ability to deduce parentheses from indentation. In sweet-expressions, a line representing a value normally has one or more n-expressions, separated by one or more spaces or tabs. If a line is indented more than the previous line, that line is a child line, and the previous line is a parent to that child. Later lines with the same indentation as the child are also children of that parent, until there is an intervening line with the parent’s indentation or less. A line with only one n-expression, and no child lines, represents itself. Otherwise, the line represents a list; each n-expression on the line is an element of the list, and each of its child lines represents an element of the list (in order). Here are some examples:

sweet-expression	s-expression
a b c(1 2)	(a b (c 1 2))
define gcd(x y) if {y = 0} x gcd y rem(x y)	(define (gcd x y) (if (= y 0) x (gcd y (rem x y))))
define factorial(n) if {n <= 1} 1 {n * factorial{n - 1}}	(define (factorial n) (if (<= n 1) 1 (* n (factorial (- n 1)))))

sweet-expression

s-expression

a b c(1 2)

(a b (c 1 2))

define gcd(x y)
  if {y = 0}
     x
     gcd y rem(x y)

(define (gcd x y)
  (if (= y 0)
      x
      (gcd y (rem x y))))

define factorial(n)
  if {n <= 1}
     1
     {n * factorial{n - 1}}

(define (factorial n)
  (if (<= n 1)
      1
      (* n (factorial (- n 1)))))

A blank line (a line containing only 0+ spaces and tabs) ends an expression once one has begun. This makes sweet-expressions easy to use interactively; just press “Enter Enter” to end an expression. Blank lines are skipped before an expression begins.

You can indent using one or more of the indent characters, which are space, tab, and the exclamation point (!). Lines after the first line need to be consistently indented, that is, the current line’s indentation, when compared to the previous line’s, are equal or one is a prefix of the other. Indentation processing does not occur inside ( ), [ ], and { }, whether they are prefixed or not; this makes sweet-expressions backwards-compatible with traditional s-expressions, and also provides an easy way to disable indentation processing if it’s inconvenient.

Clarifications

Here are a few clarifications:

An unescaped “;” not in a string still introduces comments that end at the end of the line.
Lines with only a ;-comment (preceded by 0 or more indent characters) are completely ignored - their indentation (if any) is irrelevant, and they do not end an expression.
Special comments are non-whitespace sequences other than ;-comments that do not return a datum; they include datum comments (#;datum) and block comments (#|...|#). If a special comment begins immediately after the indent, the indentation of the special comment is used.
The datum comment marker (#;) at the beginning of a line, followed by whitespace, comments out the next sweet-expression; otherwise it comments out the next neoteric-expression.
A single delimited period (.) still sets the value of the cdr field of a pair. If the period is the only datum on the line, then the next (sibling) line is the cdr value. A period at the beginning of a line, with exactly one datum after it, escapes that datum (just like neoteric-expressions do). If the period is not at the beginning of the line, there must be exactly one following datum on the line (the cdr value).
An expression that starts indented enables “initial-indent” mode. That line is considered a sequence of whitespace-separated neoteric-expressions that are each read separately (this helps backwards compatibility).

Here are some examples:

Sweet-expressions (t-expressions)	s-expressions
aaa bbb ; Comment indent ignored cc dd	(aaa bbb (cc dd))
ff ; Demo block comments #\| qq \|# t1 t2 t3 t4 t5 #\| xyz \|# t6	(ff (t1 t2) (t3 t4 (t5 t6)))
t7 #;t8(q) t9 ; Demo datum comments stuff #; a(b) here #; this is all ignored	(t7 t9 (stuff here))
foo ; Empty-value child line is still a child #; bar	(foo )
f ; Demo improper list a . b	(f (a . b))
f ; Demo vertical improper list x y . z	(f (x y) . z)
; Demo initial indent (define x 1) (define y 2)	(define x 1) (define y 2)

Advanced features

Sweet-expressions also add a few additional abbreviations, sometimes called sweet-expression “advanced features”, that make sweet-expressions even more pleasant to use. These involve the marker “\\” (called GROUP and SPLIT), the marker “$” (SUBLIST), leading traditional abbreviations (quote, comma, backquote, or comma-at) with following whitespace, and the pair of markers “<*” and “*>” (which surround a collecting list). Below is an explanation of each.

The marker \\ is specially interpreted. If no n-expressions precede it on the line, it is called GROUP, and it represents no symbol at all located at that indentation. GROUP is useful for representing lists of lists. If there are any preceding n-expressions on the line, it is called SPLIT, and it is interpreted as the start of a new line at the current line’s indentation. Examples:

Sweet-expressions (t-expressions)	s-expressions
let ; Demo GROUP \\ var1 cos(a) var2 sin(a) body...	(let ( (var1 (cos a)) (var2 (sin a))) body...)
arc-if ; Emphasize logical relationships fuzzy?(x) \\ shave(x) dry?(x) \\ pour-water-on(x) admire(x)	(arc-if (fuzzy? x) (shave x) (dry? x) (pour-water-on x) (admire x))
myfunction ; Demo SPLIT x: \\ original-x y: \\ calculate-y original-y	(myfunction x: original-x y: (calculate-y original-y))
sin 0 \\ cos 0	(sin 0) (cos 0)

The marker $ is called SUBLIST. If $ is preceded by any n-expressions on the line, the right-hand-side (including any child lines) is the last element of the list described on that line’s left-hand side. (This was inspired by the similar Haskell operator.) If the left-hand-side has no datums, it is interpreted as an empty list (putting the right-hand-side in a list). Examples:

Sweet-expressions (t-expressions)	s-expressions
a b $ c d	(a b (c d))
e f $ g	(e f g) ; Not (e f (g))
a b $ c d e $ f g	(a b (c d e (f g)))
let $ x sqrt(a) {2 * x}	(let ((x (sqrt a))) (* 2 x))
run $ grep \|-v\| "xx.*zz" <(oldfile) >(newfile)	(run (grep \|-v\| "xx.*zz" (< oldfile) (> newfile)))
let ; Demo GROUP + SUBLIST \\ c $ cos a s $ sin a body...	(let ( (c (cos a)) (s (sin a))) body...)

A leading traditional abbreviation (quote, comma, backquote, or comma-at) located at the beginning of a line (after indentation), and also followed by space, tab, or the end-of-line, is interpreted as that operator applied to the entire sweet-expression that follows. Note that datum comments (#;) do the same thing. For example:

Sweet-expressions (t-expressions) s-expressions
' a b ; Demo abbreviations
  c d
(quote (a b
  (c d)))

Sweet-expressions (t-expressions)	s-expressions
' a b ; Demo abbreviations c d	(quote (a b (c d)))

The markers “<*” and “*>” surround a collecting list. This represents a list, but unlike (...), the reader stays in indentation processing mode, the indentation level is temporarily restarted at the left edge, and blank lines do not end a collecting list. Collecting lists are useful for very long lists (such as module definitions) because they shorten indentation and allow blank lines. They are also useful in let expressions with short variable expressions. The examples section has longer examples; here are short examples:

Sweet-expressions (t-expressions)	s-expressions
let <* x sqrt(a) *> ! g {x + 1} {x - 1}	(let ((x (sqrt a))) (g (+ x 1) (- x 1)))
let <* x getx() \\ y gety() > ! {{x x} + {y * y}}	(let ((x (getx)) (y (gety))) (+ (* x x) (* y y)))
module x . <* define x(a) {2 * a} define y(b) x{b - 1} *>	(module x . ( (define (x a) (* 2 a)) (define (y b) (x (- b 1))) ))

Sweet-expressions (t-expressions)

s-expressions

let <* x sqrt(a) *>
! g {x + 1} {x - 1}

(let ((x (sqrt a)))
  (g (+ x 1) (- x 1)))

let <* x getx() \\ y gety() *>
! {{x * x} + {y * y}}

(let ((x (getx)) (y (gety)))
  (+ (* x x) (* y y)))

module x . <*
define x(a) {2 * a}

define y(b) x{b - 1}
*>

(module x . (
  (define (x a) (* 2 a))

  (define (y b) (x (- b 1)))
))

Note that if you want to use markers as ordinary symbols just surround them with curly braces or vertical bars. A {$} or |$| always means the symbol $, not the special marker.

Getting an implementation

Your Scheme implementation may already provide these capabilities if you simply enter the #!sweet directive first. If your preferred Scheme implementation does not yet support sweet-expressions, encourage or help them to add it. As an alternative, consider trying out the Readable Lisp S-expressions Project sample implementation and tools, including:

unsweeten, which translates sweet-expressions to s-expressions
sweeten, which translates s-expressions to sweet-expressions (so you can switch to sweet-expressions) and is itself written using sweet-expressions
diff-s-sweet, which reports semantic differences between a file of s-expressions and a file of sweet-expressions. It can even detect those rare s-expression files that would be interpreted differently by a sweet-expression reader.

The next two sections provide a more rigorous specification and many more examples.

Specification

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.

A sweet-expression (aka t-expression) is an external representation of a Scheme object, which may include other Scheme objects. A sweet-expression reader converts a sweet-expression into the objects the sweet-expression represents. The following subsections define this conversion by describing the Backus-Naur Form (BNF) conventions, line and indentation handling, lexing advanced features, utility procedures used in the BNF, supporting definitions in BNF format, key productions in BNF format, other requirements, and specifications about related tools.

Backus-Naur Form (BNF) conventions

The BNF productions below define the syntax of sweet-expressions, in particular, the production t_expr defines one sweet-expression. A sweet-expression reader MUST implement the productions below unless otherwise noted. The BNF productions define an LL(1) grammar written using ANTLR version 3. The action rules inside {...} are in Scheme syntax, and value-producing productions omit the text “returns [Object v]”. You can also separately view the full ANTLR BNF definition of sweet-expressions with Java action rules, along with a support Java class Pair.java. The text /*empty*/ matches an empty sequence; it is used to identify an empty branch. The non-terminal same also matches an empty sequence; it is used to emphasize where there is a new line with unchanged indentation. The non-terminal error indicates where an implementation SHOULD report an error if that position is reached.

The production “n_expr” is a neoteric-expression as defined in SRFI-105. Thus {a + b} maps to (+ a b), f(...) maps to (f ...), and f{...} with content other than whitespace maps to (f {...}). The production “n_expr_first” matches a neoteric-expression that either has no leading abbreviations (“'”, “`”, “,”, “,@”, or the syntax-case related abbreviations as defined below), or its leading abbreviation does not have a tab, space, linefeed, or carriage return immediately after it. The productions n_expr and n_expr_first MUST NOT match a comment (e.g., block comments or datum comments), though they may contain comments (e.g., a(#|x|# b)). The productions n_expr and n_expr_first MUST NOT consume trailing whitespace.

Line and indentation handling

A sweet-expression reader MUST support three modes: indentation processing, enclosed, and initial indent. A sweet-expression reader MUST start in indentation processing mode before it begins to read a sweet-expression. The reader MUST temporarily switch to enclosed mode when it is reading inside any unescaped pairs of parentheses, brackets, or curly braces.

Given these definitions (in ANTLR version 3 format):

SPACE    : ' ' ;  // U+0020
TAB      : '\t' ; // U+0009

fragment EOL_CHAR      : '\n' | '\r' ; // U+000A or U+000D
fragment NOT_EOL_CHAR  : (~ (EOL_CHAR)) ;
fragment NOT_EOL_CHARS : NOT_EOL_CHAR* ;
fragment INDENT_CHAR   : SPACE | TAB | '!' ; // An "indent character"
EOL                    : '\r' '\n'? | '\n' ; // End of line

line       : NOT_EOL_CHARS EOL ; // A line = contents followed by EOL
hspace     : SPACE | TAB ;       // A horizontal space
blank_line : hspace* EOL ;       // A "blank line" is a line with only hspaces

An implementation MAY add implementation-defined characters to EOL_CHAR and MAY accept additional sequences as EOL. An implementation MAY also accept a final line without a terminating eol sequence.

The indentation of a line is the set of all 0+ indent characters (INDENT_CHARs) at the beginning of a line. Indentation is represented in the BNF syntax definition below as if the sweet-expression reader preprocessed its input as follows. First, when the sweet-expression reader begins, a stack called the “indentation stack” is initialized to contain exactly one value, the empty string (""). Then, when in indentation processing mode, the line indentation (if any) is read, removed, and possibly replaced by other generated symbols according to the following rules (where “top” is the value of the top of the indentation stack):

If the indentation length is nonzero and the indentation is immediately followed by EOL:
1. If the indentation contains “!”, the line is ignored; an implementation MUST consume the line and reapply these rules to the next line.
2. If the indentation does not contain “!”, it is a blank line; the indentation is considered to have length zero and the rest of these rules are applied.
Otherwise, if “;” immediately follows the indentation (of length zero or more), the line is ignored; an implementation MUST consume the line and reapply these rules to the next line.
Otherwise, if the indentation length is nonzero and top is the empty string, symbol INITIAL_INDENT is generated and the reader changes to initial indent mode. When EOL is reached while in initial indent mode, the reader MUST change back to indentation processing mode.
Otherwise, if the indentation is equal to top, no extra symbol is generated. (The BNF identifies where this will match using the “same” non-terminal, but “same” is empty.)
Otherwise, if the indentation is longer than top, and top is a prefix of indentation, the indentation is pushed onto the indentation stack and the symbol INDENT is generated.
Otherwise, if top is longer than the indentation, and the indentation is a prefix of top, the indentation stack is repeatedly popped until the new top matches the current indentation or the new top is not longer than the indentation; a DEDENT symbol is generated for each pop. The matching top (if any) is not popped. If no match is found, it is an error.
Otherwise, it is an error.

Lexing advanced features

Markers are a small set of special terminals. Markers MUST be recognized if and only if the reader is in indentation processing mode, the marker is preceded by indentation or hspace, the marker is followed by an hspace or EOL, and the marker starts with the character shown (e.g., neither |$| nor '$ contains a marker). The markers are:

GROUP_SPLIT    : '\\\\' ;        // This is "\\"
SUBLIST        : '$' ;           // RHS (incl. children) is last datum of LHS
COLLECTING     : '<*' ;          // Start collecting list
COLLECTING_END : '*>' ;          // End collecting list
RESERVED_TRIPLE_DOLLAR : '$$$' ; // Reserved for future use

GROUP_SPLIT (\\) is called GROUP if it occurs in the position of the first n-expression in a given line (e.g., immediately after indentation); otherwise it is called SPLIT. When COLLECTING is recognized it pushes an empty string onto the indentation stack as well as generating COLLECTING. When COLLECTING_END is recognized it pops any non-empty strings from the indentation stack (generating a DEDENT for each one), pops the empty string initially placed by COLLECTING, generates EOL, and then finally generates COLLECTING_END.

The following whitespace-terminated terminals MUST be recognized if and only if the reader is in indentation processing mode. The linefeed or carriage return (if any) in them MUST NOT be permanently consumed by the terminal (the space or tab, if any, MAY be consumed):

QUOTEW          : '\'' (' ' | '\t' | '\n' | '\r') ;  // quote
QUASIQUOTEW     : '\`' (' ' | '\t' | '\n' | '\r') ;  // quasiquote
UNQUOTE_SPLICEW : ',@' (' ' | '\t' | '\n' | '\r') ;  // unquote-splicing
UNQUOTEW        : ','  (' ' | '\t' | '\n' | '\r') ;  // unquote
DATUM_COMMENTW  : '#;' (' ' | '\t' | '\n' | '\r') ;  // start datum comment

The syntax-case related abbreviations are “#'”, “#`”, “#,”, and “#,@”, which stand for syntax, quasisyntax, unsyntax-splicing, and unsyntax respectively. If a Scheme system supports both this SRFI and the syntax-case related abbreviations, then the reader SHOULD treat those syntax-case abbreviations when whitespace-terminated in the same manner. A sweet-expression reader MAY implement additional abbreviations.

Utility procedures used in the BNF

The BNF depends on the following utility procedures. The first ones are variants of traditional list processing procedures that also handle a special unique value “empty_value” (used to indicate when no value at all is returned). The “monify” procedure enables lines with a single n-expression and no child lines to represent themselves and not be wrapped into a list:

  (define (isemptyvaluep x) (eq? x empty_value))

  (define (not_period_and_not_empty x)
    (and (not (isperiodp x)) (not (isemptyvaluep x))))

  (define (conse x y) ; cons, but handle "empty" values
    (cond
      ((eq? y empty-value) x)
      ((eq? x empty-value) y)
      (#t (cons x y))))

  (define (appende x y) ; append, but handle "empty" values
    (cond
      ((eq? y empty-value) x)
      ((eq? x empty-value) y)
      (#t (append y))))

  (define (list1e x) ; list, but handle "empty" values
    (if (eq? x empty-value)
        '()
        (list x)))

  (define (list2e x y) ; list, but handle "empty" values
    (if (eq? x empty-value)
        y
        (if (eq? y empty-value)
           x
           (list x y))))

  ; If x is a 1-element list, return (car x), else return x
  (define (monify x)
    (cond
      ((not (pair? x)) x)
      ((null? (cdr x)) (car x))
      (#t              x)))

The BNF body production uses an isperiodp(x) function, which returns true iff x is the datum “.” and begins with a period. This is used so that “a . b” is recognized as the pair (a . b), while “a |.| b” is the 3-element list “(a |.| b)”.

Supporting BNF definitions

Here are supporting definitions in BNF format:

PERIOD   : '.';

// Comments. LCOMMENT=line comment, scomment=special comment.
// SRFI_22_COMMENT and SHARP_BANG_FILE support is RECOMMENDED.
LCOMMENT      : ';' NOT_EOL_CHARS ; // Line comment - doesn't include EOL
BLOCK_COMMENT : '#|' (options {greedy=false;} : (BLOCK_COMMENT | .))* '|#' ;
DATUM_COMMENT : '#;' ;
SRFI_22_COMMENT : '#!' SPACE NOT_EOL_CHARS ;
SHARP_BANG_FILE : '#!' ('/' | '.') (options {greedy=false;} : .)* '!#' ;
// These match #!fold-case, #!no-fold-case, #!sweet, and #!curly-infix:
SHARP_BANG_DIRECTIVE : '#!' ('a'..'z'|'A'..'Z'|'_')
                       ('a'..'z'|'A'..'Z'|'_'|'0'..'9'|'-')* ;

hs      : (options {greedy=true} : hspace)* ;

// A "separator_initial_indent" separates n-expressions in initial indent.
// An implementation MAY implement this as "(hspace | '!')+"
// (e.g., to avoid retaining state when returning a value).
separator_initial_indent: hspace+ ;

// Production "abbrevw" is an abbreviation with a following whitespace:
abbrevw
  : QUOTEW          {'quote}
  | QUASIQUOTEW     {'quasiquote}
  | UNQUOTE_SPLICEW {'unquote-splicing}
  | UNQUOTEW        {'unquote} ;

// Production "scomment" (special comment) defines comments other than ";":
sharp_bang_comment : SRFI_22_COMMENT | SHARP_BANG_FILE | SHARP_BANG_DIRECTIVE ;
scomment : BLOCK_COMMENT | DATUM_COMMENT hs n_expr | sharp_bang_comment ;

comment_eol : LCOMMENT? EOL;

skippable
  : scomment hs
  | DATUM_COMMENTW hs (ignored=n_expr hs | /*empty*/ error ) ;

Key BNF productions

Here are the key BNF productions for sweet-expressions:

// Production "collecting_content" returns a collecting list's contents.
// Precondition: After collecting start and horizontal spaces.
// Postcondition: Consumed the matching COLLECTING_END.
// FF = formfeed (\f aka \u000c), VT = vertical tab (\v aka \u000b)

collecting_content
  : it_expr more=collecting_content {(conse $it_expr $more)}
  | comment_eol    retry1=collecting_content {$retry1}
  | (FF | VT)+ EOL retry2=collecting_content {$retry2}
  | COLLECTING_END {'()} ;

collecting_list
  : COLLECTING hs cc=collecting_content hs {$cc} ;

// Process line after ". hspace+" sequence.  Does not go past current line.

post_period
  : skippable retry=post_period {$retry}
    | pn=n_expr hs skippable* (n_expr error)? {$pn}
    | cl=collecting_list skippable* (n_expr error)? {$cl}
    | /*empty*/ {'.} ;

// Production "line_exprs" reads the 1+ n-expressions on one line; it will
// return the list of n-expressions on the line.  If there is one n-expression
// on the line, it returns a list of exactly one item.
// Precondition: At beginning of line after indent
// Postcondition: At unconsumed EOL

line_exprs
  : PERIOD /* Leading ".": escape following datum like an n-expression. */
      (hspace+ pp=post_period {(list $pp)}
       | /*empty*/    {(list '.)} )
  | cl=collecting_list
      (rr=rest_of_line    {(cons $cl $rr)}
       | /*empty*/        {(list $cl)} )
  | basic=n_expr_first /* Only match n_expr_first */
      ((hspace+ (br=rest_of_line  {(cons $basic $br)}
                 | /*empty*/      {(list $basic)} ))
       | /*empty*/                {(list $basic)} ) ;

// Production "rest_of_line" reads the rest of the expressions on a line,
// after the first expression of the line.
// Precondition: At beginning of non-first expression on line (past hspace)
// Postcondition: At unconsumed EOL

rest_of_line
  : PERIOD hspace+ pp=post_period {$pp} /* Improper list */
  | skippable (retry=rest_of_line {$retry} | /*empty*/ {'()})
  | cl=collecting_list
    (rr=rest_of_line     {(cons $cl $rr)}
     | /*empty*/         {(list $cl)} )
  | basic=n_expr
      ((hspace+ (br=rest_of_line {(cons $basic $br)}
                 | /*empty*/     {(list $basic)} ))
       | /*empty*/               {(list $basic)} ) ;

// Production "body" handles the sequence of 1+ child lines in an it_expr
// (e.g., after "line_expr"), each of which is itself an it_expr.
// It returns the list of expressions in the body.

body
  : i=it_expr
     (same
       ( {(isperiodp $i)}? =>   f=it_expr DEDENT {$f} // Improper list
       | {(isemptyvaluep $i)}? => retry=body     {$retry}
       | {(not_period_and_not_empty $i)}? => nxt=body {(conse $i $nxt)} )
     | DEDENT {(list1e $i)} ) ;

// Production "normal_it_expr" is an it_expr without a special prefix.

normal_it_expr
  : line_exprs (
     GROUP_SPLIT hs {(monify $line_exprs)} // split
     | SUBLIST hs sub_i=it_expr {(appende $line_exprs (list1e $sub_i))}
     | comment_eol // Normal case, handle child lines if any:
       (INDENT children=body {(appende $line_exprs $children)}
        | /*empty*/          {(monify $line_exprs)} /* No child lines */ )) ;

// These are it_expr's with a special prefix like \\ or $:

datum_comment_line
  : DATUM_COMMENTW hs
    (is_i=it_expr | comment_eol INDENT body ) {empty_value} ;

group_line
  : (GROUP_SPLIT | scomment) hs /* Initial; Interpet as group */
      (group_i=it_expr {$group_i} /* Ignore initial GROUP/scomment */
       | comment_eol
         (INDENT g_body=body {$g_body} /* Normal GROUP use */
          | /*empty*/ {empty_value} )) ;

sublist_line // "$" first on line
  : SUBLIST hs is_i=it_expr {(list1e $is_i)} ;

abbrevw_line
  : abbrevw hs
      (comment_eol INDENT ab=body {(appende (list $abbrevw) $ab)}
       | ai=it_expr               {(list2e $abbrevw $ai)} ) ;

// Production "it_expr" (indenting sweet-expression)
// is the main production for sweet-expressions in the usual case.
// Precondition: At beginning of line after indent, NOT at an EOL char
// Postcondition: it-expr ended by consuming EOL + examining indent

it_expr
  : normal_it_expr     {$normal_it_expr}
  | datum_comment_line {$datum_comment_line}
  | group_line         {$group_line}
  | sublist_line       {$sublist_line}
  | abbrevw_line       {$abbrevw_line} ;

// Production "initial_indent_expr" is for an expression starting with indent.

initial_indent_expr
  : (INITIAL_INDENT | separator_initial_indent) (scomment hs)*
    (n_expr {$n_expr}
     | comment_eol {empty_value} ) ;

// Production "t_expr_real" handles special cases, else it invokes it_expr.

t_expr_real
  : comment_eol    retry1=t_expr_real {$retry1} // Skip initial blank lines
  | (FF | VT)+ EOL retry2=t_expr_real {$retry2} // Skip initial FF|VT lines
  | EOF                           {(generate_eof)} // End of file
  | initial_indent_expr           {$initial_indent_expr}
  | i=it_expr                     {$i} /* Normal case */ ;

// Production "t_expr" is the top-level production for sweet-expressions.

t_expr
  : te=t_expr_real	
      {(if (isemptyvaluep $te) (t_expr) $te)} ; retry if empty_value.

Note that per these definitions, a blank line normally terminates a sweet-expression once it has begun, but a blank line does not terminate a collecting list.

Other requirements

An implementation of this SRFI MUST accept a line beginning with the un-indented directive #!sweet followed by a newline in its standard datum readers (e.g., read and, if applicable, the default implementation REPL). An implementation MAY signal an error if this directive is not at the beginning of a line or cannot terminate all sweet-expressions (e.g., because it is inside parentheses or a collecting list). After reading this directive, the reader MUST accept sweet-expressions in subsequent datums read from the same port, until some other conflicting directive is given. Once a sweet-expression reader is enabled, the #!sweet directive MUST be accepted and ignored.

A #!curly-infix SHOULD cause the current port to switch to SRFI-105 semantics (e.g., sweet-expression processing is disabled). A #!no-sweet SHOULD cause the current port to disable sweet-expression processing and MAY also disable curly-infix expression processing. An implementation MAY signal an error if the directives #!curly-infix or #!no-sweet are not at the beginning of a line or cannot terminate all sweet-expressions.

A sweet-expression reader MUST support SRFI-30 (Nested Multi-line comments) block comments (#| ... |#) and SRFI-62 (S-expression comments) datum comments (#;datum). It is RECOMMENDED that the sweet-expression reader support SRFI-22 (Running Scheme Scripts on Unix) (where #! followed by space ignores text to the end of the line), #! followed by a letter as a directive (such as #!fold-case) delimited by a whitespace character or end-of-file, and the formats #!/ ... !# and #!. ... !# as multi-line non-nesting comments.

A sweet-expression reader MAY implement datum labels with syntax #number=datum. If the first character after the equal sign is not whitespace, such a reader SHOULD read it as a neoteric-expression. If the equal sign is followed by whitespace, a datum reader MAY reject it; the reader MAY also consider the datum an it_expr, and thus as a label for a sweet-expression (the sample implementation does not do this).

A well-formatted s-expression is an expression interpreted identically by both traditional s-expressions and by sweet-expressions. A well-formatted file is a file interpreted identically by both traditional s-expressions and sweet-expressions. (In practice, it appears that most real s-expression files in Scheme are well-formatted.) It is RECOMMENDED that files in traditional s-expression notation be well-formatted so that they can be directly read using a sweet-expression reader.

Implementations of this SRFI MAY implement sweet-expressions in their datum readers by default, even when the #!sweet directive is not (yet) received. Portable Scheme applications SHOULD include the #!sweet directive before using sweet-expressions, typically near the top of a file. Portable applications SHOULD NOT use this directive as the very first characters of a file because they might be misinterpreted on some platforms as an executable script header; preceding this directive with a newline avoids this problem.

Implementations MAY provide the procedures sweet-read as a sweet-expression reader and/or neoteric-read as a neoteric-expression reader. If provided, these procedures SHOULD support an optional port parameter.

Implementations SHOULD enable a sweet-expression reader when reading a file whose name ends in “.sscm” (Sweet Scheme). Application authors SHOULD use the filename extension “.sscm” when writing portable Scheme programs using sweet-expressions.

Implementations MUST provide the procedures curly-write and neoteric-write as writers that can write c-expressions and n-expressions respectively. If provided, these procedures MUST at least take a parameter (the object to write) and an optional second parameter (the port). Implementations that provide R7RS semantics (write with cycle detection, write-shared that identifies shared structures, and write-simple with no guarantee of cycle detection or shared structure identification) SHOULD include appropriate variants of these. That is, curly-write and neoteric-write that perform cycle detection, curly-write-shared and neoteric-write-shared identify shared structures, and curly-write-simple and neoteric-write-simple guarantee neither.

Note that, by definition, this SRFI modifies lexical syntax.

Related tools

Implementations MAY provide a tool, called an “unsweetener”, that reads sweet-expressions and writes out s-expressions. An unsweetener SHOULD specially treat lines that begin with a semicolon when they are not currently reading an expression (e.g., no expression has been read, or the last expression read has been completed with a blank line). Such a tool SHOULD (when outside an expression) copy exactly any line beginning with semicolon followed by a whitespace or semicolon. Such a tool SHOULD (when outside an expression) also copy lines beginning with “;#” or “;!” without the leading semicolon, and copy lines beginning with “;_” without either of those first two characters. Application authors SHOULD follow a semicolon in the first column with a whitespace character or semicolon if they mean for it to be a comment.

A program editor MAY consider highlighting lines with only 0+ hspaces (since they separate expressions) and lines beginning at the left column (since these start new expressions). We RECOMMEND that program editors highlight expressions that use initial indent mode, to reduce the risk of accidental use of this mode.

Examples

Here are some examples and their mappings. Note that a sweet-expression reader would accept either form in all cases, since a sweet-expression reader is for the most part a traditional s-expression reader with support for some additional abbreviations.

Sweet-expressions (t-expressions)	s-expressions
define fibfast(n) ; Typical function notation if {n < 2} ; Indentation, infix {...} n ; Single expr = no new list fibup n 2 1 0 ; Simple function calls	(define (fibfast n) (if (< n 2) n (fibup n 2 1 0)))
define fibup(max count n-1 n-2) if {max = count} {n-1 + n-2} fibup max {count + 1} {n-1 + n-2} n-1	(define (fibup max count n-1 n-2) (if (= max count) (+ n-1 n-2) (fibup max (+ count 1) (+ n-1 n-2) n-1)))
define gcd(x y) if {y = 0} x gcd y rem(x y)	(define (gcd x y) (if (= y 0) x (gcd y (rem x y))))
define represent-as-infix?(x) and pair? x is-infix-operator? car(x) list? x {length(x) <= 6}	(define (represent-as-infix? x) (and (pair? x) (is-infix-operator? (car x)) (list? x) (<= (length x) 6)))
define line-tail(x) cond null?(x) '() pair?(x) append '(#\space) exposed-unit car(x) line-tail cdr(x) #t append LISTSP.SP exposed-unit(x)	(define (line-tail x) (cond ((null? x) (quote ())) ((pair? x) (append '(#\space) (exposed-unit (car x)) (line-tail (cdr x)))) (#t (append LISTSP.SP (exposed-unit x)))))
g factorial(7) my-pi() #f() -i -(cos(0))	(g (factorial 7) (my-pi) (#f) 0-i (- (cos 0)))
define extract(c i) $ cond vector?(c) $ vector-ref c i string?(c) $ string-ref c i pair?(c) $ list-ref c i else $ error "Not a collection"	(define (extract c i) (cond ((vector? c) (vector-ref c i)) ((string? c) (string-ref c i)) ((pair? c) (list-ref c i)) (else (error "Not a collection"))))
define merge(< as bs) $ cond null?(as) $ bs null?(bs) $ as {car(as) < car(bs)} $ cons car as merge < cdr(as) bs else $ cons car bs merge < as cdr(bs)	(define (merge < as bs) (cond ((null? as) bs) ((null? bs) as) ((< (car as) (car bs)) (cons (car as) (merge < (cdr as) bs))) (else (cons (car bs) (merge < as (cdr bs))))))
let <* x $ cos $ f c *> ! dostuff x	(let ((x (cos (f c)))) (dostuff x))
let <* x $ {oldx - 1} \\ y $ {oldy - 1} > ! {{x x} + {y * y}}	(let ((x (- oldx 1)) (y (- oldy 1))) (+ (* x x) (* y y)))
; Torture test a \|.\| b {$} c d . .	; Presumes \|...\| supported (a \|.\| b \|$\| c d . \|.\|)
$a \\b ; Not markers (not delimited)	($a \\b)
; Demo BEGIN with an indent f(a) g(x)	(f a) (g x)
struct: pt ((x : Real) (y : Real)) {distance : (pt pt -> Real)} define distance(p1 p2) sqrt{sqr{pt-x(p2) - pt-x(p1)} + sqr{pt-y(p2) - pt-y(p1)}}	(struct: pt ((x : Real) (y : Real))) (: distance (pt pt -> Real)) (define (distance p1 p2) (sqrt (+ (sqr (- (pt-x p2) (pt-x p1))) (sqr (- (pt-y p2) (pt-y p1))))))
define-library example grid export make rows cols ref each rename(put! set!) import (scheme base) <* begin define make(n m) let (grid(make-vector(n))) do <* i 0 {i + 1} > ! {i = n} grid ! let < v make-vector(m #false) > ! vector-set! grid i v define rows(grid) vector-length(grid) define cols(grid) vector-length(vector-ref(grid 0)) define ref(grid n m) and {-1 < n < rows(grid)} {-1 < m < cols(grid)} vector-ref vector-ref(grid n) m define put!(grid n m v) vector-set! vector-ref(grid n) m v >	(define-library (example grid) (export make rows cols ref each (rename put! set!)) (import (scheme base)) (begin (define (make n m) (let ((grid (make-vector n))) (do ((i 0 (+ i 1))) ((= i n) grid) (let ((v (make-vector m #false))) (vector-set! grid i v))))) (define (rows grid) (vector-length grid)) (define (cols grid) (vector-length (vector-ref grid 0))) (define (ref grid n m) (and (< -1 n (rows grid)) (< -1 m (cols grid)) (vector-ref (vector-ref grid n) m))) (define (put! grid n m v) (vector-set! (vector-ref grid n) m v))))
define foo(x) . <* define bar(y) ! y define baz(z) ! z *>	(define (foo x) . ( (define (bar y) y) (define (baz z) z) ))
define init(win area) let $ style $ get-style win set! back-pen $ black style set! fore-pen $ white style let \\ config $ make-c area expose $ make-e area set! now expose dostuff config expose	(define (init win area) (let ((style (get-style win))) (set! back-pen (black style)) (set! fore-pen (white style)) (let ( (config (make-c area)) (expose (make-e area))) (set! now expose) (dostuff config expose))))

Design Rationale

We have separated the design rationale from the overall rationale, as was previously done by SRFI-26 and SRFI-105, because it is easier to understand the design rationale after reading the specification. It is long because we wish to describe, in some detail, why things are done the way they are, including some helpful comparisons to other efforts.

Basic approach

The following subsections describe the overall basic approach that sweet-expressions take to improve s-expression readability.

General and homoiconic formats

There have been a huge number of past efforts to create readable formats for Lisp-based languages, going all the way back to the original M-expression syntax that Lisp’s creator expected to be used when programming. Generally, they’ve been unsuccessful, or they end up creating a completely different language that lacks the advantages of Lisp-based languages.

After examining a huge number of them, David A. Wheeler noticed a pattern: Past “readable” Lisp notations typically failed to be general or homoiconic:

A general format is independent of any specific underlying semantic. Most readability efforts focused on creating special syntax for each language construct of an underlying language. But since Lisp-based languages can trivially create new semantic constructs (via macros), and are often used to process fragments of other languages, these did not work well. It was often difficult to keep updating the parser to match the underlying system, so the parser was always less capable than using s-expressions... leading to abandonment of the specialized parser. One example of this process, among many, is the IACL2 (Infix ACL2) interface of ACL2. Sometimes the parser was continuously maintained, but this led to the development of a completely new language that was less suitable for self-analysis of program fragments and similar tasks (and thus no longer a suitable “Lisp”). In short, any new Lisp notation should be general.
A homoiconic format is a surface format in which the human reader can easily determine what the underlying representation is. It is very difficult to take advantage of Lisp capabilities, such as macros, without a homoiconic format. Yet many past readability efforts made it difficult to determine exactly what structures were being created by the notation. Typical infix notations with precedence were especially common examples of this problem - they would quietly create multiple lists without obvious indications that this was happening. Top Down Operator Precedence by Douglas Crockford (2007-02-21), for example, discusses Vaughan Pratt’s “Top Down Operator Precedence” and shows how important homoiconicity is. He stated that “parsing techniques are not greatly valued in the LISP community, which celebrates the Spartan denial of syntax. There have been many attempts since LISP’s creation to give the language a rich ALGOL-like syntax, including Pratt’s CGOL, LISP 2, MLISP, Dylan, Interlisp’s Clisp, and McCarthy’s original M-expressions. All failed to find acceptance. That community found the correspondence between programs and data to be much more valuable than expressive syntax. But the mainstream programming community likes its syntax, so LISP has never been accepted by the mainstream.” As discussed below, “The Evolution of Lisp” by Guy Steele and Richard Gabriel also stresses the importance of homoiconic notations in Lisp-based languages.

See http://www.dwheeler.com/readable/readable-s-expressions.html for a longer discussion on past efforts. In any case, now that this pattern has been identified, new notations can be devised that are general and homoiconic - avoiding the problems of past efforts.

Sweet-expressions were specifically designed to be general and homoiconic, and thus have the possibility of succeeding where past efforts have failed.

Is it impossible to improve on s-expression notation?

Some Lisp developers act as if Lisp notation descended from the gods, and thus is impossible to improve. The authors do not agree, and instead believe that Lisp notation can be improved beyond the notation created in the 1950s. The following is a summary of a retort to those who believe Lisp notation cannot be improved, based on the claims in the Common Lisp FAQ and “The Evolution of Lisp” by Guy Steele and Richard Gabriel. Below are quotes from those who argue against improvement of s-expression notation, and our replies.

The Common Lisp FAQ says that people “wonder why Lisp can’t use a more ‘normal’ syntax. It’s not because Lispers have never thought of the idea - indeed, Lisp was originally intended to have a syntax much like FORTRAN...”.

This is an argument for our position, not for theirs. In other words, even Lisp’s creator (John McCarthy) understood that directly using s-expressions for Lisp programs was undesirable. No one argues that John McCarthy did not understand Lisp. Since even Lisp’s creator thought traditional Lisp notation was poor, this is strong evidence that traditional s-expression notation has problems.

“The Evolution of Lisp” by Guy Steele and Richard Gabriel (HOPL2 edition) says that, “The idea of introducing Algol-like syntax into Lisp keeps popping up and has seldom failed to create enormous controversy between those who find the universal use of S-expressions a technical advantage (and don’t mind the admitted relative clumsiness of S-expressions for numerical expressions) and those who are certain that algebraic syntax is more concise, more convenient, or even more natural...”.

Note that even these authors, who are advocates for s-expression notation, admit that for numerical expressions they are clumsy. We agree that slavishly copying Algol is not a good idea. However, sweet-expressions do not try to create an “Algol-like” syntax; sweet-expressions are entirely general and not tied to a particular semantic at all.

That paper continues, “We conjecture that Algol-style syntax has not really caught on in the Lisp community as a whole for two reasons. First, there are not enough special symbols to go around. When your domain of discourse is limited to numbers or characters, there are only so many operations of interest, and it is not difficult to assign one special character to each and be done with it. But Lisp has a much richer domain of discourse, and a Lisp programmer often approaches an application as yet another exercise in language design; the style typically involves designing new data structures and new functions to operate on them - perhaps dozens or hundreds” and it’s just too hard to invent that many distinct symbols (though the APL community certainly has tried). Ultimately one must always fall back on a general function-call notation; it’s just that Lisp programmers don’t wait until they fail.”

This is a weak argument. Practically all languages allow compound symbols made from multiple characters, such as >=; there is no shortage of symbols. Also, nearly all programming languages have a function-call notation, but only Lisp-based languages choose s-expressions to notate it, so saying “we need function call notation” do not excuse s-expressions. You do not need legions of special syntactic constructs; sweet-expressions allow developers to express anything that can be expressed with s-expressions, without being tied to a particular semantic or requiring a massive set of special symbols.

“Second, and perhaps more important, Algol-style syntax makes programs look less like the data structures used to represent them. In a culture where the ability to manipulate representations of programs is a central paradigm, a notation that distances the appearance of a program from the appearance of its representation as data is not likely to be warmly received (and this was, and is, one of the principal objections to the inclusion of loop in Common Lisp).”

Here Steele and Gabriel are extremely insightful. Today we would say that s-expressions are “homoiconic”. Homoiconic notations are extremely rare, and this property (homoiconicity) is an important reason that Lisps are still used decades after their development. Steele and Gabriel are absolutely right; there have been many efforts to create readable Lisp formats, and they all failed (at least in part) because they did not create formats that accurately represented the programs as data structures. A key and distinguishing advantage of a Lisp-like language is that you can treat code as data, and data as code. Any notation that makes this difficult means that you lose many of Lisp’s unique advantages. Homoiconicity is critical if you’re going to treat a program as data. To do so, you must be able to easily determine, from the input text, the program’s underlying structure. If you can easily determine this, you can do amazing manipulations.

But what Gabriel and Steele failed to appreciate in their paper is that it’s possible to have a notation that is general, homoiconic, and easier to read. Now that we understand why past efforts failed, we can devise notations that are general and homoiconic - and succeed!

Many people have noted that there are tools to help deal with s-expressions, but this misses the point. If the notation is so bad that you need tools to deal with it, it would be better to fix the notation. The resulting notation could be easier to read, and you could focus your tools on solving problems that were not self-inflicted. In particular, “stopping to see the parentheses” is a sign of a serious problem - the placement of parentheses fundamentally affects interpretation, and serious bugs can hide there.

Others who have used Lisp for years, such as Paul Graham, see s-expressions as long-winded, and advocate for the use of “abbreviations” that can map down to an underlying s-expression notation. Sweet-expressions take this approach.

Why should indentation be syntactically meaningful?

Current Lisp syntax is widely mocked in the software development community; as noted above, lisp is sometimes called “lots of irritating superflous parentheses”. Even Lisp’s creator, John McCarthy, did not intend for s-expressions to be used directly by developers.

The problem is not the existence of grouping symbols; nearly all programming languages have grouping symbols. For example, languages with C-like syntax often use {...} to surround statement groups, and (...) are widely used to group infix operators in expressions. The problem with Lisp s-expression notation is that parentheses (and possibly other paired characters) are the only way to group constructs, and that almost every construct requires a grouping construct. As a result, Lisp programs have a large number of parentheses, far more than many developers prefer.

On Lisp’s Readability and Parenthesis Stacking shows one of the many examples of endless closing parentheses and brackets to close an expression, and the confusion that happens when indentation does not match the parentheses. bhurt’s response to that article is telling: “I’m always somewhat amazed by the claim that the parens ‘just disappear’, as if this is a good thing. Bugs live in the difference between the code in your head and the code on the screen - and having the parens in the wrong place causes bugs. And autoindenting isn’t the answer - I don’t want the indenting to follow the parens, I want the parens to follow the indenting. The indenting I can see, and can see is correct.”

An IDE can help keep the indentation consistent with the parentheses, and editing modes are certainly helpful when reading and modifying Lisp programs. But needing IDEs to use a language is considered by some a language smell. After all, a good IDE can support a good notation too. If you need special tools to work around problems with the notation, then the notation itself is a problem.

Some obvious “solutions” do not work well enough:

We can add another character pair. Scheme R6RS did this by adding [...]. Some implementations also allow {...}. But in practice this does not seem to work well; relatively few people use these alternatives in practice, and no one claims that the result is as readable as other languages (e.g., Python). Scheme R7RS-small dropped supporting [...], with a strong vote against them (out of 15 voters, 11 voted against supporting [...]). Part of the problem may be that (), [], and {} aren’t visually distinct enough to be helpful, but regardless of the the reason, there is clearly no groundswell supporting them.
We could define a fixed syntax that is tailored to fixed language semantics. That is the “usual way” this problem is solved in other languages, and when these languages are used in their anticipated domain, this works well. But Lisps are often used for symbol manipulation, where symbols may actually be for a domain-specific language and where you can easily create new meanings (via macros). So this “usual solution” doesn’t well work for Lisps (without giving up some of the reasons for using a Lisp in the first place).

Thankfully, making indentation syntactically meaningful can be general; that is, it does not need to be tied to any particular semantic. A notation where indentation is syntactically meaningful can also be homoiconic. As noted earlier, a Lisp notation needs to be general and homoiconic. Yet indentation can represent complexly-nested structures and do it in a way that is visually distinct from parentheses. An indentation-sensitive approach is also easy to define in a way that retains backwards compatibility.

Making indentation syntactically meaningful eliminates many parentheses, making it much simpler for humans to determine which datums are in which expression. Lisp programs already use indentation to show structure, indeed, developers in nearly all languages already use indentation to show structure even when it is ignored by the underlying system. Currently tools (like editors and pretty-printers) are used to try to keep the indentation (used by humans) and parentheses (used by the computers) in sync. By making the indentation (which humans depend on) actually used by the computer as well, they are automatically kept in sync. Syntactically-meaningful indentation complements parentheses nicely; we find that people naturally use indentation to show the larger structure of an expression, and then use a few parentheses to show its “leaves”.

In this SRFI we call this approach “syntactically-meaningful indentation”. A related term is the “off-side rule” (from “The Next 700 Programming Languages” by P. J. Landin, Communications of the ACM, Volume 9 / Number 3, March 1966, describing the ISWIM (If you See What I Mean) language family). The original definition of the off-side rule, per Landin’s paper, is a very specific form of syntactically-meaningful indentation: “[the] southeast quadrant that just contains the phrase’s first symbol must contain the entire phrase, except possibly for bracketed subsegments.” The term “off-side rule” is often used today with a broader meaning (possibly including any syntactic use of indentation). To avoid confusion, we’ll use the clearer term “syntactically-meaningful indentation”.

Using syntactically-meaningful indentation is a proven idea. Programming languages that use syntactically-meaningful indentation include Python (the eighth most popular programming language in the May 2013 TIOBE index), Haskell, Occam, F#, ABC, and Cobra (a variant of Python with strong compile-time typechecking). Indeed, Python core developer Raymond Hettinger gave a list of what he believed were the “winning language features” of Python in his PyCon 2013 keynote, and the first item he listed was the required indentation of Python. Raymond said that this syntactically meaningful indentation “contributes to the ‘clean, uncluttered’ appearance of the code... [and] is a net positive... Python ‘never lies with its visual appearance’, which is a winning feature”. Notations for data and documents that use syntactically-meaningful indentation include YAML, as well as the lists and quotations of Markdown and reStructuredText. The triumvirate preprocessors of HAML (to generate HTML), Sass (to generate CSS), and Coffeescript (to generate JavaScript) also use syntactically-meaningful indentation. The Wikipedia article off-side rule lists additional notations with syntactically-meaningful indentation.

The StructuredText markup notation also has syntactically-meaningful notation, and potentially serves as a cautionary tale that mechanisms other than indentation are needed in a practical notation. As noted in Problems With StructuredText, the “original StructuredText and StructuredTextNG require that section structure be indicated through indentation, as ‘inspired by Python’. For certain structures with a very limited, local extent (such as lists, block quotes, and literal blocks), indentation naturally indicates structure or hierarchy. For sections (which may have a very large extent), structure via indentation is unnecessary, unnatural and ambiguous.” However, the authors of that commentary did not abandon syntactically-meaningful notation. Instead, when they created reStructuredText, they retained syntactically-meaningful notation but added a different special notation for sections. Sweet-expressions is also a syntactically-meaningful notation, but its collecting list mechanism enables users to avoid having indentations with very long extents.

Another historical problem with indentation as syntactically relevant is that some transports drop leading space and tab characters. As discussed in the indentation characters section, we have solved this by adding “!” as an indentation character.

“In praise of mandatory indentation...” notes that it can be helpful to have mandatory indentation:

It hurts me to say that something so shallow as requiring a few extra spaces can have a bigger effect than, say, Hindley-Milner type inference. - Chris Okasaki

There’s a lot of past work on indentation to represent s-expressions. Examples include:

Paul Graham (developer of Arc) is known to be an advocate of indentation for this purpose. As noted above, Kragen Sitaker’s notes on Graham and Arc discusses how indentation can really help (in this notation, functions with no parameters need to be surrounded by parentheses, to distinguish them from atoms - “oh well” ). Graham’s RTML is implemented using Lisp, but uses indentation instead of parentheses to define structure. RTML is a proprietary programming language that at least was used by Yahoo!’s Yahoo! Store and Yahoo! Site hosting products (though Yahoo may have transitioned away from it). See Paul Graham’s comments about the RTML language design and this introduction to RTML by Yahoo.
Darius Bacon’s “indent” file, includes his own implementation of a Python/Haskell-like syntax for Scheme using indentation in place of parentheses, and in that file he also includes Paul D. Fernhout’s implementation of an indentation approach. Bacon’s syntax for indenting uses colons in a way that is limiting (it interferes with other uses of the colon in various Lisp-like languages).
Lispin discusses a way to get S-expressions with indentation.
Scheme SRFI-49, I-expressions - which are discussed next.

What is the relationship between sweet-expressions and SRFI-49 (I-expressions)?

The sweet-expression indentation system is based on Scheme SRFI-49 (“surfi-49”), aka I-expressions, by Egil Möller. The basic rules of SRFI-49 (I-expression) indentation are kept in sweet-expressions; these are:

An indented line is a parameter of its parent.
Later terms on a line are parameters of the first term.
A line with exactly one term, and no child lines, is simply that term; multiple terms are wrapped into a list.
A line beginning with an abbreviation (such as '), followed by space or tab, abbreviates the rest of the expression.

These basic rules seem fairly intuitive and do not take long to learn. We’re grateful to the SRFI-49 author for his work, and at first, we just used SRFI-49 directly.

However, SRFI-49 turned out to have problems in practice when we tried to use it seriously. For example, in SRFI-49, leading blank lines could produce the empty list () instead of being ignored, limiting the use of blank lines and leading to easy-to-create errors. As specified, a SRFI-49 expression would never complete until after the next expressions’s first line was entered, making interactive use extremely unpleasant. Lines with just spaces and tabs would be considered different from blank lines, creating another opportunity for difficult-to-find errors. The symbol group is given a special meaning, which is inconsistent with the rest of Lisp (where only punctuation has special syntactic meanings). The mechanism for escaping the group symbol was confusing. There were also a number of defects in both its specification and implementation.

Thus, based on experience and experimentation we made several changes to it. First, we fixed the problems listed above. We also addressed supporting other capabilities, namely, infix notation and allowing formats like f(x) (see neoteric expressions as defined in SRFI-105). We also found that certain constructs were somewhat ugly if indentation is required, so we added sublists, split, and collecting list capabilities.

The SRFI-49 BNF is simpler, but it is simpler in part because some whitespace processing requirements are not made clear. The BNF in this specification, in contrast, makes comment and whitespace processing explicit (e.g., using the hspace non-terminal). Our goal was to make comment and whitespace processing requirements unambiguous.

The very existence of SRFI-49 shows that others believe there is value in using syntactically-significant indentation. We are building on the experience of others to create what we hope is a useful and refined notation.

Why are sweet-expressions separate from curly-infix and neoteric-expressions as defined in SRFI-105?

Some Scheme users and implementers may not want indentation-sensitive syntax, or may not want to accept any change that could change the interpretation of a legal (though poorly-formatted) s-expression. For those users and implementers, SRFI-105 adds infix support and neoteric-expressions such as f(x), but only within curly braces {...}, which are not defined by the Scheme specification anyway. SRFI-105 makes it easier to describe the “leaves” of an s-expression tree.

In contrast, sweet-expressions extend SRFI-105 by making it easier to describe the larger structure of an s-expression. It does this by treating indentation (which is usually present anyway) as syntactically relevant. Sweet-expressions also allow neoteric-expressions outside any curly braces. By making sweet-expressions a separate tier, people can adopt curly-infix if they don’t want indentation to have a syntactic meaning or want to ensure that f(x) is interpreted as the two separate datums f and (x).

Writing out results

This SRFI includes a requirement to implement curly-write and neoteric-write. It could be argued that these really belong in SRFI-105. However, SRFI-105 was designed to require little implementation effort; it often requires only adding a few lines to the reader. In contrast, implementing a writer that picks good formats, and handles cycles or shared structures, is more complex. Since reading sweet-expressions takes more code, and such users are likely to want a writer for c-expressions and n-expressions, they have been placed here.

This SRFI does not require a sweet-write procedure, for the simple reason that this procedure is essentially a pretty-printer... and thus is probably better specified separately. Those who are interested in tools to write sweet-expressions should look at the code available in the readable project.

In these notations there is more than one way to present expressions, but this is not a change. Even with traditional s-expressions, no Lisp guarantees that what it writes out is the same sequence of characters that was written. For example, (quote x) when read might be written back as 'x, while on other implementations, reading 'y might be printed as (quote y). Similarly, if you enter (a . (b . ())), many Lisps will write that back as “(a b)”. As always, you should implement your Lisp expression writer so that it presents a format convenient to both human and machine readers.

The specification does not require any specific formatting for c-expressions or n-expressions when written out; this is left as a quality-of-implementation issue. Nevertheless, there is an expectation that these procedures will usefully take advantage of the additional notational capabilities. The reference implementation demonstrates one way this can be done. For example, the reference implementation chooses the unprefixed infix notation for a list if the first element is a symbol, the symbol is all punctuation characters or a special symbol (and, or, or xor), and there are 3-6 elements in the (proper) list.

Backwards compatibility (well-formatted s-expressions)

Backwards compatibility with traditional Lisp notation is helpful. A reader that can also read traditional s-expressions, formatted conventionally, is much easier to switch to.

The sweet-expression notation is fully backwards-compatible with well-formatted Lisp s-expressions. In practice, most s-expressions used in real programs are well-formatted. Thus, a user can enable sweet-expressions and continue to read and process traditionally-formatted s-expressions as well. If an s-expression is so badly formatted that it would be interpreted differently, that s-expression can be processed by a traditional s-expression pretty-printer and have the problem resolved.

The changes that can cause a difference in interpretation are due to the active use of neoteric-expressions outside of {...}, unlike SRFI-105, and because of the indentation processing.

Neoteric-expressions are compatible for “normal” formatting. The key issue is that neoteric-expressions change the meaning of an opening parenthesis, bracket, or brace after a character other than whitespace or another opening character. For example, a(b) becomes the single expressions “(a b)” in sweet-expressions, not the two expressions “a” followed later by “(b)”. There are millions of lines of Lisp code that would never see the difference. So if you wrote “a(b)” expecting it to be “a (b)”, you will need to insert the space before the opening parenthesis. We believe such s-expressions are poorly (and misleadingly) formatted in the first place; you should write “a (b)” if you intend for these to be two separate datums.

Sweet-expressions add indentation processing, but since indentation is disabled inside (...), and initial indentation also disables indentation processing, ordinary Lisp expressions immediately disable indentation processing and typically don’t cause issues. In rare circumstances they can be interpreted differently:

If you have a top-level expression with more than one datum on a line and the line doesn’t begin with space/tab, they will be interpreted differently. Thus, at the topmost level, “(a) (b)” on one line is interpreted as two datums “(a)” followed by “(b)” in traditional Lisp, but this is a single “((a) (b))” in sweet-expressions. Note that this interpretation is also disabled by any indentation, so just inserting an initial space on those rare lines where this occurs ensures compatibility for this case.
Sweet-expressions also count “!” at the beginning of a line as an indent character while indentation processing is enabled. This rarely causes any issue, since once you use an open parenthesis to start an expression any this meaning for “!” is disabled, and practically all non-trivial s-expressions begin with an open parenthesis. In addition, the first character on a line other than space, tab, or “!” also disables this interpretation on that line. Generally, to have an issue you’d have to have a symbol whose name starts with “!” (such symbols are extremely unusual), and then use them directly at the top level to retrieve its value (this would also be extremely unusual).

Ease of implementation

The notation has been designed to be relatively easy to implement. In addition, the BNF specification is specifically written so that it can be easily implemented using a recursive descent parser that corresponds to the given rules. For example, the BNF specification is LL(1). The BNF rules are given in a form so that it would be easy to implement a parser that does not consume characters unless necessary and to not require multi-character unread-char (this makes it easy to reuse an underlying read procedure).

Unlike the SRFI-49 BNF, this BNF makes comment and whitespace processing explicit, to make comment and whitespace processing requirements clear.

Our experience implementing this notation suggests that our ease-of-implementation goal has been met.

Simplicity

We have strived to provide powerful capabilities with a relatively small number of constructs. We combined s-expressions, the infix and traditional function call notation of SRFI-105, and an indentation processing and abbreviation approach based on SRFI-49. We then added a few special abbreviations to make common constructs especially easy to notate (\\, $, <*...*>, and initial abbreviations followed by whitespace). That’s all.

Since sweet-expressions are essentially a superset of s-expression notation, they are necessarily “more complex” than s-expressions. But all notations are a trade-off; if a notation is often used, it may be useful to add additional syntax to make it easier to read and write. It is clear that many developers do not find traditional s-expression notation adequately readable, and Lisp developers must routinely read and write many programs and data structures in some notation. Thus, we believe it is a reasonable trade-off to add additional syntax to make these expressions more readable.

Some people have argued for more complex structures than this, and others have argued for less; we have tried to strike a balance.

We have chosen to develop a rigorous specification using an exacting BNF. The specification could be much shorter if we were not rigorous, but this rigor was in response to lessons learned with SRFI-49. The SRFI-49 specification was simpler but left a number of issues underspecified, and thus it easily led to different interpretations. The key BNF definitions for sweet-expressions only uses 14 productions in 104 non-comment non-blank lines. As a comparison, defining Scheme simple datums (such as identifiers and numbers) using the same notation uses far more productions (52) and a few more lines (109 non-comment non-blank lines).

We have written real programs using this notation, to validate that it is reasonably easy to understand and is practical in real use. In the process of using this notation we developed SPLIT, SUBLIST, and collecting list constructs to deal with real-world constructs. It is possible to work without them, but we believe without them the notation would be less pleasant to use.

Whitespace, indentation, and comment handling

The following subsections describe the specific sweet-expression constructs related to whitespace, indentation, and comment handling, including why they are defined the way they are.

Blank lines

A blank line, in this specification, is a line with only 0+ spaces and/or tabs. They are also called “empty lines”, but this specification uses the term “blank line” to reduce confusion (the word “empty” has many other meanings). The issues of how to deal with blank lines, and how to end expressions, involves various trade-offs between use in a REPL and use in a file.

In sweet-expressions, a blank line always terminates a datum, once an expression has started; if (another) expression has not started, blank lines are skipped. That means that in a REPL, once you’ve entered a complete expression, “Enter Enter” will always end it. The “blank lines at the beginning are skipped” rule eliminates a usability problem with the original SRFI-49 (I-expression) spec, in which two sequential blank lines before an expression surprisingly returned (). This was a serious usability problem. The sample SRFI-49 implementation did end expressions on a blank line - the problem was that the spec didn’t clearly capture this.

Various other altenatives for ending an expression were considered and discarded:

A top-level expression could be determined simply by noting that the next expression began on the left column. This would work well in files. However, this would be hideous to use in a REPL, because it would mean that the results of an expression would only be evaluated after the first (and possibly only) line of the next expression was entered. This is absurdly confusing and unacceptable. Early Pascal I/O had similar problems.
It would be possible to have blank lines end an expression only in interactive use. In particular, Python does this; Python has different rules for interactive use and files. However, this means that you cannot cut-and-paste text from files into the REPL interpreter and use them directly. David A. Wheeler believes it’s important to have exactly the same syntax in both cases in a Lisp-based system, because in Lisp-based systems, switching between the REPL and files is extremely common.
A special text marker that means “done” could be used (e.g., “.” on a line by itself). However, users would often forget the end marker, making them less desirable in both REPL and file use. It would also make interactive use much less pleasant, since users then have to repeatedly type the special “end-of-expression” marker after each expression. As Beni Cherniavsky-Paskin observed on the readable-discuss mailing list (2013-01-16), “I absolutely hate SQL prompts that don’t execute until I add a ;”.
An expression could end at two blank lines instead of one. This would allow easy insertion of single blank lines. But it would be easy to make mistakes, since the difference of adding an extra blank line would be less obvious than adding one blank line, and it would require pressing Enter three times in typical REPL use (which would be less pleasant). It would also use up a lot of vertical real estate in a REPL, and vertical space is relatively precious.
Another solution, already in sweet-expressions, is quickly executing one-line commands by typing an indent character first. But users will often not know exactly how long an expression will be until it is done, so this does not help enough.

The chosen solution, where a blank line (“Enter Enter”) always end an expression, works reasonably well and keeps the notation consistent. Pressing Enter twice is quite easy in a REPL (since the user’s finger is already on Enter to press it the first time). Of course, people sometimes want to have something like a blank line in the middle of an s-expression, to vertically separate parts of a larger expression. Sweet-expressions have alternatives that work quite well for this purpose:

The recommended solution is to use comment-only lines using “;” (indented or not). These are completely ignored and not even considered blank lines. Thus, you can use comment-only lines for the purpose of separating sections in a single datum. The indentation of comment-only lines is intentionally ignored; that way, you don’t have to worry about making sure that comment indentation matches its surroundings. We’ve found that in practice this works very well.
You can also use a line with at least one exclamation point, and nothing else other than possibly whitespace; such lines are also ignored.
If you also want to disable indentation processing, placing the expression inside parentheses causes blank lines to be ignored.
In very long expressions (e.g., for a set of definitions in a library), a collecting list can typically be used.

Since a line with only spaces or tabs may look exactly identical to a blank line, we decided to clearly state that a line with only spaces and tabs is a blank line (and is treated the same way). This eliminates some nasty usability problems that could arise if a “blank” line was interpreted differently depending on whether or not it had invisible whitespace in it. A silent error like this could be hard to debug.

Trailing horizontal spaces are ignored

It is not possible to see trailing horizontal space on most screens and printouts. Thus, the BNF is defined so that in normal cases trailing horizontal space is ignored (except in special cases such as being inside a string constant).

Indentation characters (! as indent)

Some like to use spaces to indent; others like tabs. Python allows either, and SRFI-49 allows either as well. Sweet-expressions continues this tradition, and is defined so that people can use what they like. The only rule is that in sweet-expressions users must be consistent; if a line is indented with eight spaces, the next line cannot be indented with a tab.

One challenge with tabs is that systems vary how they are displayed. On some systems, a tab moves to the next 8th character position, but this is not universal. In sweet-expressions, tabs (as with any indentation characters) must be used consistently. This simple rule completely eliminates the problems caused by variances in how they are displayed.

One objection that people raise about mandatory indentation is that horizontal whitespace can get lost in many transports (HTML readers, etc.). In addition, sometimes there are indented groups that you’d like to highlight; traditional whitespace indentation provides no opportunity to highlight indented groups specially. When discussing syntax, users on the readable-discuss mailing list started to use characters (initially period+space) to show where indentation occurred so that they wouldn’t get lost or to highlight them. Eventually, the idea was hit upon that perhaps sweet-expressions needed to support a non-whitespace character for indentation. This is highly unorthodox, but at a stroke it eliminates the complaints some have about syntactically-important indentation (because it is lost by some transports), and it also provides an easy way to highlight particular indented groups.

At first, we tried to use period, or period+space, as the indent, as this was vaguely similar to its use in some tables of contents. But period has too many other traditional meanings in Lisp-like languages, including beginning a number (.9), beginning a symbol (...), and as a special operator to set the cdr of a list. Implementation of period as an indent character is much easier if there is a way to perform two-character lookahead (e.g., with an unread-char function), but unread-char is not standard in Scheme R5RS, and Common Lisp does not mandate support for two-character lookahead. Eventually the “!” was selected instead; it practically never begins a line, and if you need it, {!...} will work. The exclamation point is much easier to implement as an indent character, and it is also a great character for highlighting indented groups.

The rule for lines with only indentation, and at least one “!”, is due to much discussion. First, let’s review the rules for indentation-only lines in other cases. A line with ONLY spaces and tabs is considered to be the same as a blank line, for the straightforward reason that in many circumstances you can’t see the difference. Treating “no character” lines and “tabs+spaces only” lines differently could lead to subtle, hard-to-understand bugs.

This then leads to the question, what would a line mean if it has only indentation characters and includes a “!” ? Originally it meant the same thing (“blank line”), because it was a line with only indent characters. However, that interpretation seemed odd, because you can see a difference compared to a line with no characters. That semantic was confusing, so Wheeler changed this to being illegal.

On May 1, 2013, Beni Cherniavsky-Paskin questioned this semantic on the SRFI-110 discussion list, saying, “Once you prefix a block with !, it seems to me there is no reason to additionally require a comment ... to express ignored vertical whitespace. Isn’t this the cleanest thing possible:”

  define long-func(x)
  let ((foo bar(x)))
  ! do stuff
  ! ...
  !
  ! more stuff
  ! ...

On May 2, 2013, David Vanderson said he was for allowing “!” in an indentation-character-only line, but noted that there were good reasons to not enforce indentation matching. He gave this code example:

define long-func(x)
let outer ((foo bar(x)))
! let inner ((y z))
! ! do stuff
! ! ...
!
! ! more stuff
! ! ...

David Vanderson then asked, “In this case, is ‘more stuff’ a child of ‘inner’ or ‘outer’ ? Even in the first example, having a space after the ! on the blank line would throw off the indentation, right? I think I’ve convinced myself that indentation should NOT be enforced.”

No one identified a hole in David Vanderson’s argument. As a result, lines with only indentation characters, and at least one “!”, are now completely ignored.

Disabling indentation processing with paired characters

Indentation processing is disabled inside (...), [ ... ], and { ... }. This was also true of SRFI-49, and of Python, and has wonderful side-effects:

Indent parsing becomes very safe to use with existing code. Pre-existing code will almost certainly start each expression with an opening parenthesis, disabling the indentation processing it wasn’t expecting.
It makes it easy to disable indentation processing whenever it is inconvenient. For example, it supports dealing with text that is very close to running off the right-hand side, or is complex to express with indentation.
It is similar to what other indentation-sensitive languages do, such as Python.
It is a very easy rule to explain, remember, and reason about.

This means that infix processing by curly-infix disables indentation processing; in practice this doesn’t seem to be a problem.

Disabling indentation processing with an initial indent

Initial indentation also disables indentation processing, which improves backward compatibility and makes it easy to disable indentation processing where convenient.

This initial indent mode improves backward compatibility because a program that uses odd formatting with a different meaning for sweet-expressions is more likely to have initial indents. Even when this is not true, it’s trivially easy to add an initial indent to oddly-formatted old files. This provides a trivial escape, making it easy to support old files. Then, even if you have ancient code with odd formatting, it is more likely to still “just work” if there is any initial indentation... and if necessary, it is easy to add. We’d like this reader to be a drop-in replacement for read(), so minimizing incompatibilities is important.

There is a risk that this indentation will be accidental (e.g., a user might enter a blank line in the middle of a routine and then start the next line indented). However, this is less likely to happen interactively (users can typically see something happened immediately), and editors can easily detect and show where surprising indentation is occurring (e.g., through highlighting), so this risk appears to be minimal.

The specification description might seem to imply that a reader must track the initial indent state after it returns, but this is not the case. If a reader can avoid consuming any whitespace after an initial indent and a neoteric-expression, it can simply return and use that whitespace to re-trigger the initial indent state. This approach will not work if the reader performs all lexical analysis before parsing (as ANTLR does), but in that case, the lexer can simply keep track of the current mode (as shown in the BNF).

Disabling on initial indent also deals with a subtle problem in implementation. We would create significant reader implementation problems if we tried to accept expressions that began with arbitrary indentation on the first line (using that indentation as the starting point). Typically readers return a whole value once that value has been determined, and in many cases it’s tricky to store state (such as that new indentation value) for an arbitrary port. By disabling indentation processing, we eliminate the need to store such state, as well as giving users a useful tool.

Since this latter point isn’t obvious, here’s a little more detailed explanation. Obviously, to make indentation syntactically meaningful, you need to know where an expression indents, and where it ends. If you read in a line, and it has the same indentation level, that should end the previous expression. If its indentation is less, it should close out all the lines with deeper or equal indentation. But we’re trying to minimize the changes to the underlying language, and in particular, we don’t want to change the “read” interface and we’re not assuming arbitrary amounts of unread-char. Scheme R5RS, for example, doesn’t have a standard unread-char at all. Now imagine that the implementation tries to support arbitrary indentation for the initial line of an expression (instead of requiring that expressions normally start at the left edge). Let’s say you are trying to read the following:

! ! foo
! ! ! bar
! ! eggs
! ! cheese

You might expect this to return three datums: (foo bar), eggs, and cheese. It won’t, in a typical implementation; here’s why:

In the first read(), it reads foo, bar, and it consumes the indentation of “eggs” so that it can determine that the line with eggs is at the same level as foo. It returns (foo bar).
In the second read(), it reads eggs with NO indentation, because the indentation was previously consumed by the first read() so it could determine when it was finished. It then reads the indentation of cheese, which has an indentation more than zero, and thus appears to be more deeply indented than eggs. It returns (eggs cheese), and we’ve consumed it all... but perhaps not with the expected semantics.

Some solutions:

If you have unlimited unread-char, there is no problem, just unconsume characters once you’ve found the end. But many Lisps don’t have that.
Read could store indentation state associated with the port. But the user could call other routines, and a naive implementation would read the wrong values. You’d have to re-wrap the entire I/O system if you really wanted to be able to undo the indentation reliably. That creates a complicated implementation that is likely to be unreliable, and it’s lousy for performance.

So for all the reasons above, initial indent disables indentation processing for that line.

Why are the indentations of block comments and datum comments significant?

A line that starts with a ; after the indent is completely ignored, including the indent of that line. In contrast, a line that starts with a #; datum comment or a #| ... |# block comment after a possible indent is considered to be indented at the position where the comment starts. This means that in sweet-expressions, ; line comments have a subtly different semantic meaning from datum or block comments.

These are the reasons for this difference between line comments and datum or block comments:

For block comments, it would be possible to write a comment that includes a newline, then some more comment text, then the |# terminator for block comments, followed by ordinary datums. We could have declared that block comments that include newlines would have the comment-only lines deleted, and block comments would have each character replaced with a space. For example:

Original	Could’ve mapped to (but doesn’t!)
foo #\|comment #1\|# bar #\|comment #2\|# quux	foo bar quux
foo #\| block comment \|# bar quux	foo bar quux

But what if Chinese, Japanese, or Korean double-width characters are found? The sensible approach would be to require that double-width characters be replaced with two spaces rather than one, but this requires implementations to know those characters and replace them differently. It was judged to be a significant implementation overhead, for what is essentially an edge case, for a style that we felt utterly defeats the clarity of indentation. Instead, we mandate that block comments are simply deleted outright.

Outright deleting comments makes the meaning of the sequence “indent, block/datum comment, space, datum” misleading. For example:
```
foo
    bar
    #| ...
|#  quux
```
A simple “outright delete” would yield:
```
foo
    bar
      quux
```
This is arguably a misleading translation.
Further, our expected use case for block comments would look like this:
```
define foo(x)
  #|
   | First, bar the x.
   | Then quux it so that x is no longer xuuq-able
   |#
  bar x
  quux x #| Need to quux here
          | to prevent conflicting with
          | the bar table
          |#
```
Again, a simple “outright delete” would yield a blank line right after the “define foo(x)” line. Instead, what we mandate is that, if a block or datum comment immediately follows indentation, it is deleted outright, and replaced with GROUP/SPLIT (\\). Block or datum comments that do not follow indentation are simply deleted without being replaced with anything:
Original Maps to
define foo(x) #| | standalone comment |# #| pre-comment |# bar #| in-comment |# quux
define foo(x) \\ \\ bar quux

Original	Maps to
define foo(x) #\| \| standalone comment \|# #\| pre-comment \|# bar #\| in-comment \|# quux	define foo(x) \\ \\ bar quux

Although the reasons above pertain mostly to block comments, datum comments (#;) are considered essentially identical to block comments if “#;” is not followed by whitespace.

We could have mandated a different behavior between datum and block comments. But it is helpful to review the reason for the existence of datum comments. There are two major use cases:

To just comment out a single, short item from a list.
```
(foo bar #;quux meow)
```

To easily remove the last item of a multi-line list, where that item is itself several lines:

(define (foo x)
  (if (not (foo-able? x))
      (error "Cannot foo the " x)
      (begin
        (en-bar x)
        ; quuxing is currently buggy
        #;(quux
          (barred-form x)
          (co-barred-form x)
          (de-xuuqed x)))))

For the last case, while typically a multi-line list is commented out by using ; line comments, in standard s-expression syntax all closing parentheses are “piled on” to the last line. Using just ; would also comment out the closing parentheses of begin, if, and define.

But with sweet-expressions, there are no explicit closing parentheses. In sweet-expression form, using line comments suffices:

define foo(x)
  if not(foo-able?(x))
     error "Cannot foo the " x
     begin
       en-bar x
       ; quuxing is currently buggy
       ;;quux
       ;;  barred-form x
       ;;  co-barred-form x
       ;;  de-xuuqed x

Thus, the expected use case of datum comments in sweet-expressions is limited to the first case, i.e. commenting-out a single short item.

Since this first case can be handled sufficiently well by having datum comments take on the same behavior as block comments (i.e. delete outright, if at start of line after indent replace with \\) then it was considered simpler to just use the same behavior for both.

Child lines producing an empty value are still child lines

Under the current semantics:

foo
==> foo

foo
! bar
==> (foo bar)

But what should be done when there are child lines, and all of them produce an empty value (from datum comments or block comments)? For example:

foo
! #; bar

We have two conflicting rules: (1) a child line means that the parent should be wrapped in a list, but (2) comments usually are ignored.

The current specification resolves this by picking rule #1 as being more important, thus, this is (foo). If a line has a child line, then it’s wrapped into a list... even if the child lines together produce an empty value.

The rationale is that if you comment out later lines that are parameters of a procedure, you probably still want to treat it as a procedure. E.G., given:

foo
! complicated-expression 4 7
! ! otherstuff 8

Let’s imagine that you comment out foo’s parameters:

foo
! #; complicated-expression 4 7
! ! otherstuff 8

In most cases, you probably still want “foo” to be called, so “(foo)” is probably the correct way to interpret this. Thus, it makes sense to simply say that “monify” is only called when there are no child lines. It’s an easier rule to explain, too. And once you accept that notion, it’s important to be consistent about it all the way through.

If someone REALLY wants foo to become a single value, he can use line comments or a block comment so that there are no child lines:

foo #|
; complicated-expression 4 7
; ! otherstuff 8 |#

foo #|
! #; complicated-expression 4 7
! ! otherstuff 8 |#

End-of-line (EOL) handling

This SRFI only requires support for the end-of-line sequences linefeed (LF), carriage return (CR), and CRLF. Earlier versions also supported reversed LFCR, IBM’s NEL (U+0085), Unicode line-separator (LS, U+2028), and Unicode paragraph-separator (PS, U+2029), but these have been dropped. This is because in practice the only end-of-line markers that are used in practice are LF, CR, and CRLF. For example, these are the only end-of-line markers included in Scheme R7RS draft 9.

John Cowan posted on 2013-02-28 that, “NEL is used only on EBCDIC systems, and conversion to ASCII usually changes it to LF rather than U+0085. LS was Unicode’s attempt to kill CR/LF/CR+LF, which failed completely...” The same problem applies to PS, which is not used in practice.

CR by itself is obsolescent. However, CR by itself has been used on many historical systems, so we expect that there are older files which still use it. Lone CR is permitted by various Scheme specifications, and it is easy to implement too. Thus, we believe it is worth including in the specification.

Reversed LFCR almost never happens in practice, and attempting to detect it triggers a bug in many versions of the guile implementation of Scheme. In many versions of guile, peek-char consumes (instead of just peeking) an end-of-file (EOF) marker (bug 12216). Thus, after seeing an LF, peeking to see if there is a CR would consume any EOF after an LF, making ending interactive use awkward on systems that use just LF for end-of-line.

End-of-file (EOF) handling

Non-empty files must end with an end-of-line sequence, before any end-of-file (EOF) marker, to be portable sweet-expression files. This limitation greatly simplifies the specification and implementation of a sweet-expression reader, without limiting the data that sweet-expressions can represent. In practice, text editors normally create such files anyway, so this is not a serious limitation.

This requirement is not unique to sweet-expressions. For example, several versions of the C language standard say “A source file that is not empty shall end in a new-line character, which shall not be immediately preceded by a backslash character” (section 2.1.1.2 of the ANSI C 1989 standard, section 5.1.1.2 of the ISO C 1999 standard, and section 5.1.1.2 of the ISO/IEC C 2011 standard ISO/IEC 9899:2011).

Sweet-expression reader implementations are free to warn about files that fail to meet this requirement. Sweet-expression reader implementations are also free to support files that do not meet this limitation. The sample reader accepts, in most cases, non-empty files that end without a preceding end-of-line sequence.

Special semicolon values for an unsweetener

As described in the specification, a tool (called an “unsweetener”) that reads sweet-expressions and writes out s-expressions SHOULD specially treat certain lines that begin with semicolons.

The initial-semicolon rules for “;” followed by space or semicolon are given so that some comments - particularly the ones about major new components - are likely to be included in a translation from sweet-expressions to s-expressions (namely, any comments that precede an expression). This can greatly simplify examining the generated s-expression. The rules about “;#”, “;!”, and “;_” make it easier to write shell scripts and similar constructs with embedded sweet-expressions; these lines can invoke some Scheme interpreter, possibly via a shell.

This text is limited to only apply to lines outside of any sweet-expression. This is intentional, because this makes it easy to implement an unsweetener on top of an existing existing sweet-expression reader. The top-level unsweetener tool can simply see if a line begins with semicolon, and if it does, handle it specially; if it starts with an end-of-line, it can just copy it, and if a line starts with any other character it can call the sweet-expression reader to handle it. There is no requirement to copy block comments, or comments inside a sweet-expression datum, because this would be much more complicated to do; handling block comments is non-trivial functionality that a sweet-expression reader must perform, and there is no standard way to return comments inside a datum. Semicolon comments immediately after a datum need not be copied or processed specially, because a sweet-expression reader has to consume them to see if it’s reached the end of the datum. A Scheme implementation with unlimited unread could do more with relative ease, but since many Scheme implementations do not have unlimited unread, these limitations make implementation of such tools much simpler.

These rules are based on the unsweeten tool.

Other specific sweet-expression constructs

The following subsections describe other specific sweet-expression constructs, including why they are defined the way they are.

Singleton expression represents itself

Just like SRFI-49, a single neoteric expression on a line with no child lines (a singleton) represents itself. In contrast, if there are multiple expressions (on a line or via a child), they are wrapped into a list. An alternative would be to always wrap a line into a list, and then require some special marker to indicate that a singleton should not be wrapped in a list.

We believe the SRFI-49 approach is the better alternative. It is common to wind down to some specific value such as a specific number, constant construct, or variable value to be returned. It seems far simpler to simply state the value to be returned instead of requiring a special marker to indicate that a value just means itself. Singleton values on a line representing themselves are far more common than a single item in a list, so it makes sense to make the more-common case easier to represent. Finally, we believe this approach produces a more familiar result. In many cases a list with only one element is a procedure call passing no values; in this case, the format “name()” is extremely familiar because it is the same format used in many other programming languages. In rare cases where it truly is a list with one element, but not a procedure call, the alternative format “(element-value)” also seems quite clear.

The #!sweet directive

The directive #!sweet is intended to be used before any sweet-expressions in Scheme. This improves backwards compatibility; readers can by default read only traditional s-expressions, and only change when they receive #!sweet. Readers are allowed, but not required, to accept sweet-expressions before this directive.

The directive #!sweet was chosen as an analogy to similar Scheme directives, such as #!fold-case and #!no-fold-case (R6RS and R7RS), #!r6rs (R6RS), and #!curly-infix (SRFI-105). Note that this enabling directive is specifically for Scheme; other Lisp notations (e.g., Common Lisp) may use different mechanisms to enable sweet-expression processing.

Implementations are only required to support these directives if they are at the beginning of a line and they can end a sweet-expression (e.g., they are not in the middle of a pair of parentheses or a collecting list). It is not hard to parse the directive (indeed, the BNF describes how to do so), but many implementations may have difficulty switching modes in the middle of processing a sweet-expression, and there is no strong reason to require this functionality.

A list expression such as (srfi 110) was intentionally not used. If this was used, a reader would not be able to easily distinguish between (1) a list to read and (2) a command to change modes. Also, not all Scheme systems support ways to invoke SRFIs, or even a module system, and there are many module systems in use. A special directive avoids these issues.

On 2013-03-07 Jos Koot reported that this should work well with Racket, a popular Scheme implementation. Racket’s documents say: “#! is an alias for #lang followed by a space when #! is followed by alphanumeric ASCII, +, -, or _. Use of this alias is discouraged except as needed to construct programs that conform to certain grammars, such as that of R6RS [Sperber07].” Since #!sweet is indeed defined by a grammar, this is consistent. Jos Koot continues, “I see no problem here for an implementation in Racket.”

Grouping and splitting (\\)

SFRI-49 had a mechanism for defining lists of lists, using the symbol “group”. This was a valuable contribution, since there needs to be some way to show lists of lists.

But after use, it was determined that having an alphabetic symbol being used to indicate a special abbreviation was a mistake. All other syntactically-special abbreviations in Lisp are written using punctuation; having one that was not was confusing. This symbol is still called the GROUP symbol, and happens at the start of a line (after indentation)... it is just now respelled as \\.

For example, this GROUP symbol makes it easy to handle multiple variables in a let expression:

let*
  \\
    variable1 my(value1)
    variable2 my(value2)
  do-stuff1 variable1
  do-stuff2 variable1 variable2

A different problem is that sometimes you’d like to have a set of parameters, where they are at the “same level” but writing them as indented parameters takes up too much vertical space. An obvious example is keywords in various Lisps; having to write this is painful:

foo
  keyword1:
  parameter1
  keyword2:
  parameter2
  ....

David A. Wheeler created an early splicing proposal. After much discussion, to solve the latter problem, the SPLIT symbol was created, so that you could do:

foo
  keyword1: \\ parameter1
  keyword2: \\ parameter2
  ....

Or, equivalently:

foo
  keyword1:
  \\   parameter1
  keyword2:
  \\   parameter2

At first the symbol \ was used for SPLIT, but this would cause serious problem on Lisps that supported slashification. After long discussion, the symbol \\ was decided on for both; although the number of characters in the underlying symbol could vary (depending on whether or not slashification was used), this was irrelevant and seemed to work everywhere. By using the same symbol for both GROUP and SPLIT, we reduced the number of different symbols that users needed to escape.

We dropped the SRFI-49 method for escaping the symbol by repeating it (group group); the {} escape mechanism is more regular, and makes it far more obvious that some special escape is going on.

Why does initial \\ mean nothing if there are datums afterwards on the same line?

Since “let” occurs in many programs, it would have been possible to define \\ to allow this:

let
! \\ var1 $ bar x
! !  var2 $ quux x
! nitz var1 var2

We discussed this, but after long discussion we decided against this. There are other ways handling constructs like multi-variable let, also, if the first variable later has a more complex expression it cannot be so easily extended with indentation. Instead, we decided on defining “\\” as an empty symbol, making that expression exactly the same as:

let
! var1 $ bar x
! !  var2 $ quux x
! nitz var1 var2
; =>
;   (let (var1 (bar x (var2 (quux x))))
;      (nitz var1 var2))

We did this intentionally. It turns out that there are situations where you want a \\ as an empty symbol, even when text follows it on the line. An example is arc’s if-then-else, where there are logically pairs of items, but from a list semantic are at the same level. E.G.:

if
! condition1()
! \\ action1()
! condition2()
! \\ action2()
! \\ otherwise-action()

For a more Scheme-centric viewpoint, some Scheme implementations use keyword objects. For example, in Guile, module declarations look like:

define-module
! \\ amkg cat meow
! #:use-module
! \\ amkg dog woof
! #:export
! \\ (meow hiss)

As noted earlier, there are other ways handling constructs like multi-variable let. You can use an empty GROUP symbol to achieve the same effect (at the cost of one more line). Also, the collecting list notation (<*...*>) handles short let variable assignment in a more graceful way. Thus, there was no strong reason to use the first semantic while there were many good reasons to choose the semantic actually chosen.

Traditional abbreviations

As with SRFI-49, a leading traditional abbreviation (quote, comma, backquote, or comma-at) right after any indent, and followed by space or tab, is that operator applied to the sweet-expression starting at the same line. For example, a complex indented structure can be quoted simply by prefixing a single quote and space. This makes it easy to add abbreviations to complex indented structures. An abbreviation alone on a line (after indentation), followed by an indented expression, applies that abbreviation to the expression; this seems to be what “users expect”, and supporting it eliminates a potential source of confusion.

Sublist ($)

On 2012-07-18, Alan Manuel Gloria noted that certain constructs were common and annoying to express, e.g., first(second(third(fourth))), and based on Haskell experience, suggested being able to write them as first $ second $ third(fourth). David A. Wheeler later found that, in the Quicklisp Common Lisp archives, there is an “infix dollar reader” by SUZUKI Shingo specifically to implement an infix “$” in Common Lisp.

This is another example (like GROUP/SPLIT) of a construct that, when you need it, is incredibly useful. It is simply an abbreviation for very common practice. It’s not all that unusual to have a few processing or cleanup functions that take a single argument, and for all the “real work” to be nested in something else. This would require several levels of indentation without sublist, but they are easily handled with sublist.

This may be easiest to see with an example. The Scheme shell (scsh) has functions like “run” that take one parameter (another list) and apply them specially. With sublist, this is easily expressed. You can even try this out by using the “unsweeten” tool developed by the readable project and entering this (which allows sweet-expressions can be directly used to control the Scheme shell scsh):

  unsweeten | scsh

For example, here’s a sweet-expression that could be typed into this and then executed using scsh:

  run $ grep -v "xx.*zz" <(oldfile) >(newfile)

Oh, and a brief aside: There are some complications when combining symbols beginning with “-”, especially with older Scheme specifications. These issues have nothing to do with sweet-expressions, but we thought you should know about them since they impact using Scheme as a shell (and sweet-expressions do make such uses easier):

One specific problem is that “-i” is the negated square root of 1 in later versions of Scheme, so the specific option “-i” is awkward to portably refer to.
The sample implementation, R6RS, and R7RS all support escaping symbol identifiers with |...|, so |-v| would work and comply with the later standards. However, R5RS does not require support for |...|, and scsh version 0.6.7 does not provide support for |...|. (In scsh, -i is a symbol, not a number.)
R5RS and R6RS do not require support for any symbols that directly start with “-”, other than “-” itself (as they are not in the set of defined <initial>). If you want full compliance with the R6RS Scheme standard, you should escape any multi-character symbol beginning with “-” by surrounding it with |...|. The R7RS specification does allow symbols to be directly written in most cases beginning with “-” (e.g., if followed by a letter or a dash), with the exception of “-i”, so in most cases options do not require escaping to be portable in R7RS. In fact, many actual Schemes in practice do support such symbols, even if they do not claim to implement R7RS. Note that the sample implementation permit symbols beginning with “-”; if the sequence of characters is not a number, it is considered a symbol.

SUBLIST also makes certain idioms possible. For instance, some functions need to change their behavior based on the type of the inputs. Here’s an example, a definition that could take advantage of SRFI-105’s $bracket-apply$ :

define c[i]
  cond
    vector?(c)
      vector-ref c i
    string?(c)
      string-ref c i
    pair?(c)
      list-ref c i
    else
      error "Not a collection"

This perfectly valid sweet-expression shows two common occurrences in Scheme programming: A function that immediately begins with cond, and cond clauses with relatively short tests and tail sequences. The above formatting has several lines with single n-expression (e.g. “cond”, “else”, “string?(c)”, etc.), and many other lines that are fairly short.

Vertical space is precious. Using SUBLIST, we can choose to compress the code to:

sweet-expression	s-expression
define c[i] $ cond vector?(c) $ vector-ref c i string?(c) $ string-ref c i pair?(c) $ list-ref c i else $ error "Not a collection"	(define ($bracket-apply$ c i) (cond ((vector? c) (vector-ref c i)) ((string? c) (string-ref c i)) ((pair? c) (list-ref c i)) (else (error "Not a collection"))))

sweet-expression

s-expression

define c[i] $ cond
  vector?(c) $ vector-ref c i
  string?(c) $ string-ref c i
  pair?(c)   $ list-ref c i
  else       $ error "Not a collection"

(define ($bracket-apply$ c i) (cond
  ((vector? c) (vector-ref c i))
  ((string? c) (string-ref c i))
  ((pair? c)   (list-ref c i))
  (else        (error "Not a collection"))))

Arguably, this can be done by putting the cond branches in explicit parentheses. However, the idiom supported by SUBLIST is more general than explicit parentheses can be, because SUBLIST does not disable indentation processing. In particular, this idiomatic formatting of cond using SUBLIST makes possible the following code:

sweet-expression	s-expression
define merge(< as bs) $ cond null?(as) $ bs null?(bs) $ as {car(as) < car(bs)} $ cons car as merge < cdr(as) bs else $ cons car bs merge < as cdr(bs)	(define (merge < as bs) (cond ((null? as) bs) ((null? bs) as) ((< (car as) (car bs)) (cons (car as) (merge < (cdr as) bs))) (else (cons (car bs) (merge < as (cdr bs))))))

sweet-expression

s-expression

define merge(< as bs) $ cond
  null?(as)           $ bs
  null?(bs)           $ as
  {car(as) < car(bs)} $ cons
                         car as
                         merge < cdr(as) bs
  else                $ cons
                         car bs
                         merge < as cdr(bs)

(define (merge < as bs) (cond
  ((null? as)            bs)
  ((null? bs)            as)
  ((< (car as) (car bs)) (cons
                           (car as)
                           (merge < (cdr as) bs)))
  (else                  (cons
                           (car bs)
                           (merge < as (cdr bs))))))

Without SUBLIST, the more complex branches of the cond would have to be formatted differently from the simpler branches (unless you are willing to waste a line to write just “as”), or would be expressed in deeply-nested parentheses, defeating the purpose of using sweet-expressions.

Another idiom that SUBLIST enables is a straightward let* format where variable names are followed by $. This eliminates the need to parenthesize the othermost layer of the variable value expressions:

sweet-expression	s-expression
let* \\ asline $ line-in-collecting car(m) length-asline $ length asline body...	(let* ( (asline (line-in-collecting (car m))) (length-asline (length asline))) body...)

After discussion, SUBLIST was accepted in 2012-07-23.

Why is `a $ b` equivalent to `(a b)` rather than `(a (b))`?

When initially learning SUBLIST, some people assume that “a $ b” should map to “(a (b))”. However, the specification specifically does not yield this semantic; “a $ b” maps to “(a b)”. At first, some people think that this is an inconsistency.

However, this is actually more consistent and produces better results. SUBLIST ($) does not imply that the succeeding text should be a list; instead, it denotes that the succeeding text is the last argument of the current line.

More concretely, consider this code:

a
  b
    c
      d

The sub-list starting with b is the last (and only) argument of a, the sub-list starting with c is the last (and only) argument of b, and so on.

SUBLIST allows us to compress this text into a shorter form:

a $ b
  c
    d

We can repeat this:

a $ b $ c
  d

However, if a $ b is (a (b)), we need to stop at this point, because:

Original	Maps to:
a b	(a b)

Since outside of SUBLIST, we consistently map a singleton datum as that datum by itself, SUBLIST also consistently maps a singleton datum as that datum by itself.

By selecting this behavior, the example above can be expressed as:

Original	Equivalent to:	Maps to:
a b c d	a $ b $ c $ d	(a (b (c d)))

This consistency is desirable; let’s review the merge example from the previous question:

define merge(< as bs) $ cond
  null?(as)           $ bs
  null?(bs)           $ as
  {car(as) < car(bs)} $ cons
                         car as
                         merge < (cdr as) bs
  else                $ cons
                         car bs
                         merge < as (cdr bs)

We can adopt a coding style where the condition and the branch code in a cond expression is separated consistently by a SUBLIST character. This consistency is impossible if SUBLIST always created a list even in the case that the right-hand side is a single datum.

Why specifically use `$` for SUBLIST, and `\\` for the two behaviors GROUP and SPLIT?

On 2013-05-27, John David Stone posted on the SRFI-110 mailing list that, “there is nothing about the \\ and $ symbols that represents the corresponding syntactic structures, and indeed \\ is used in two completely different ways, depending on context. It even has two different names, GROUP and SPLIT, as if to emphasize the impossibility of making the same symbol serve as an icon for two unlike syntactic operations. ”

As it happens, both symbols were chosen by the same author (Alan Manuel K. Gloria), and he had already posted the rationales for using $ for SUBLIST, as well as for using \\ for GROUP and SPLIT, on the readable-discuss mailinglist where this notation and the SRFI-105 notation were initially developed.

For SUBLIST, one basically considers “zooming in” on the $ symbol, while looks a little like this:

  (|   <-- open parenthesis
   |)  <-- close parenthesis

This visual pun is largely due to the common behavior that a $ b c will magically insert a pair of open and close parentheses. Of course, a $ b will not insert parentheses, but the reason for this was explained in the previous section.

For GROUP/SPLIT, the main reason for merging the two “unlike” behaviors GROUP and SPLIT into a single symbol is because of the following reversible transformation:

SPLIT	GROUP
foo \\ bar	foo \\ bar

This reversible transformation is useful for keywords. While the Scheme standard does not have keywords, they are idiomatic in some implementations — notably Guile — and are idiomatic in other Lisp-like languages — notably Common Lisp. Here is an example of defining a class in Guile’s GOOPS:

define-class <player> (<game-object>)
  owned
    #:allocation \\ #:virtual
    #:slot-ref
    \\  lambda (pl)
          player-owned pl
    #:slot-set!
    \\  lambda (_ __)
          error "attempt to change 'owned slot of <player>." _ __

The exact symbol \\ was chosen because there were four expected use cases, summarized below:

Use Case	Example	Why `\\`
Lists of lists	let \\ ! foo $ compute 'foo ! bar $ compute 'bar { foo + bar }	The `\\` points to the lower right, where the list elements are.
Denoting keyword values	keyworded-function :keyword \\ compute 'keyword-value	Think of the `\\` as a “rounded edge” directing an invisible, bent line from the bottom of the keyword that turns 90 degrees to point to its attached value. :keyword \| \\-> compute 'keyword-value
Separate elements on a single line	display "Hello, " \\ display player \\ newline()	The `\\` is almost like a vertical bar separating the elements.
Alternative for denoting a line to ignore	define foo(x) \\ define bar(y) compute 'bar x y \\ compute 'foo bar x	The `\\` is almost like a vertical bar pointing downwards. Arguably the comment `;` is superior for this usage, but the option exists for whatever rhetorical reason users might find.

Collecting lists (<* ... *>)

Each sweet-expression is ended with a blank line, which is usually what you want. There is one circumstance where that behavior is awkward: a long sequence of definitions within an initial statement. We have developed a solution, collecting lists, that are also useful for 1-2 variable let-like statements.

An accidental blank line between two internal definitions will end the initial statement:

define-library
  example grid
  export make rows cols ref each rename(put! set!)
  import scheme(base)
  begin
    define make(n m)
      let (grid(make-vector(n)))
        do (i(0 {i + 1}))
        ! {i = n} grid
        ! let (v(make-vector(m #false))) vector-set!(grid i v)
    define rows(grid) vector-length(grid)
    define cols(grid) vector-length(vector-ref(grid 0))

; The above blank line prematurely ends define-library, and since
; there is an initial indent, these will be interpreted quite differently:
    define ref(grid n m)
      and
        {-1 < n < rows(grid)}
        {-1 < m < cols(grid)}
        vector-ref vector-ref(grid n) m
    define put!(grid n m v) vector-set!(vector-ref(grid n) m v)

You can work around this for short sequences by removing the blank lines or replacing them with one of:

a ; comment (optionally indented) — the recommended approach
a correctly-indented GROUP (\\) symbol
a correctly-indented special comment (#|...|# or #;...)

For longer sequences (say, much longer than a screen), use collecting lists (<* ... *>). The <* and *> represent opening and closing parentheses, but restart indentation processing at the beginning, and collect any sweet-expressions inside. In a collecting list, horizontal spaces after the initial <* are consumed, and then sweet-expressions are read. These t-expressions must not begin with an indent (though you can indent lines with only ;-comments).

Here an example of using collecting lists for the library structure above:

define-library
  example grid
  export make rows cols ref each rename(put! set!)
  import scheme(base)
  <* begin

define make(n m)
  let (grid(make-vector(n)))
    do (i(0 {i + 1}))
    ! {i = n} grid
    ! let (v(make-vector(m #false))) vector-set!(grid i v)

define rows(grid) vector-length(grid)
define cols(grid) vector-length(vector-ref(grid 0))

define ref(grid n m)
  and
    {-1 < n < rows(grid)}
    {-1 < m < cols(grid)}
    vector-ref vector-ref(grid n) m

define put!(grid n m v) vector-set!(vector-ref(grid n) m v)
*>

Why a new construct?

Wholesale changes to sweet-expressions do not seem warranted for this special case, because there are reasons that sweet-expressions are defined the way they are. It is fundamental that a child line is indented from its parent, since that is the point of indentation. Opening a parentheses intentionally disables indentation processing; this is what developers typically expect (note that both Python and SRFI-49 do this), and it also makes sweet-expressions very backwards-compatible with traditional s-expressions. Ending a definition at a blank line is very convenient for interactive use, and interactive and file notation should be identical (since people often switch between them).

Note: Python works around this by having different semantics for files vs. interactive use.

The collecting list symbols are carefully chosen. The characters < and > are natural character pairs that are available in ASCII. What is more, they are not delimiters, so any underlying Scheme reader will not immediately stop on reading them (making it easier to reuse an underlying Scheme reader when implementing a sweet-expression reader). The “*” is more arbitrary, but the collecting list markers need to be multiple characters to distinguish them from the less-than and greater-than procedures, and this seemed to be a fairly distinctive token that is rarely used in existing code.

In some cases, you might want to use a collecting list around a long construct, but not actually create a new list. This occurs, for example, in a library module system with an implicit begin. This is not a problem; just use a collecting list after a period (.). This will attach the collecting list to the end of the list in process of being defined, instead of creating completely subordinate list. After all, since “(a b . (c d))” is just “(a b c d)”, when indentation processing is active the line “a b . <* c d *>” is also just “(a b c d)”. Here is an example:

define-library (example grid) . <*

export make rows cols ref each rename(put! set!)
import scheme(base)

define make(n m)
  let (grid(make-vector(n)))
    do (i(0 {i + 1}))
    ! {i = n} grid
    ! let (v(make-vector(m #false))) vector-set!(grid i v)

define rows(grid) vector-length(grid)
define cols(grid) vector-length(vector-ref(grid 0))

define ref(grid n m)
  and
    {-1 < n < rows(grid)}
    {-1 < m < cols(grid)}
    vector-ref vector-ref(grid n) m

define put!(grid n m v) vector-set!(vector-ref(grid n) m v)
*>

Collecting lists can also be used in a let-style statement with one or two variables with short initial values. The sweet-expression notation cleanly handles cases where let-expression variables have complex values (e.g., using \\), but for simple cases (1-2 variables having short initial values) it can take up more vertical space than traditional formatting. Using a leading “$” takes up somewhat less vertical space, but it still takes up an additional line for a trivial case, it does not work the same way for let expressions with 2 variables, and David A. Wheeler thinks it is a rather unclear construction. In particular, you cannot use “$ x 5 $ y 7” for a two-variable let statement; that would map to ((x 5 (y 7))), not ((x 5) (y 7)). You can also use parenthetical notation directly, but this is relatively ugly and it is annoying to need to do this for a common case. A similar argument applies to do-expressions, and these are not at all unusual in Scheme code:

let  ; Using \\ takes up a lot of vertical space in simple cases
  \\
    x 5
  {x + x}

let
  \\
    x 5
    y 7
  {x + x}

let  ; Less vertical space, but works for 1 variable only
  $ x 5
  {x + 5}

; The two-variable format can be surprising and does not let the
; programmer emphasize the special nature of the variable assignments
; (compared to the later expressions in a let statement).
let
  x(5) y(7)
  {x + 5}

let (x(5)) ; Use parentheses
  {x + x}
let (x(5) y(7))
  {x + x}

Here are some examples of collecting lists for the let-variable cases:

let <* x 5 *>
  {x + x}
; ==> (let ((x 5)) (+ x x))

let <* x 5 \\ y 7 *>
  {x + x}
; ==> (let ((x 5) (y 7)) (+ x x))

Reserved marker ($$$)

It seems prudent to have a symbol available for future expansion. Thus, the marker $$$ is reserved for future use. This means that $$$ must be escaped (e.g., {$$$} or |$$$|) if it is used in an indentation-processing context.

Line Continuation

Sweet-expressions do not, at this time, include a line continuation mechanism.

One could be easily added. One approach would be to consider “.” at the beginning of a line, with expressions after it, as meaning “this continues the previous line”. Another approach would be to interpret “\\” at the end of a line (after at least one n-expression) as a line continuation; we even have a draft BNF construction that would do that:

normal_it_expr
  : line_exprs (
     GROUP_SPLIT hs
      (options {greedy=true;} :
       comment_eol same more=it_expr {$v = appende($line_exprs.v, $more.v);}
       | /*empty*/ {$v = monify($line_exprs.v);} )
    ...

However, we’ve tried to minimize the number of mechanisms in the notation, and there didn’t seem to be a strong use case for line continuations for Scheme. If the sub-components have structure you can just use indentation as intended, and if it’s just a long list of items, using parentheses to surround a list works just fine. There were also concerns by some that “\\” at the end of a line would be confusing. It would be easy to add “\\” as a line continuation in some future version if it proves to be necessary, and implementations could choose to add it as an extension.

Comparisons to other notations

The following subsections compare sweet-expressions to a few of the many alternative notations that exist (including some alternatives created during its construction).

Comparison to M-expressions

M-expressions (or meta-expressions) are a notation developed by John McCarthy, and were intended to be the primary notation for developing software in Lisp. As later explained by John McCarthy in “History of Lisp” (1979-02-12), “The project of defining M-expressions precisely and compiling them or at least translating them into S-expressions was neither finalized nor explicitly abandoned. It just receded into the indefinite future, and a new generation of programmers appeared who preferred internal notation to any FORTRAN-like or ALGOL-like notation that could be devised.”

Documents such as the LISP 1.5 Programmer’s Manual do hint at the intended syntax of M-expressions. Function names were written in lower case letters (to distinguish them from atoms, which were only upper case), followed by a pair of square brackets. Inside the square brackets were semicolon-separated arguments. Thus, the M-expression cons[A; (B C)] represented the s-expression (cons A (B C)); if computed it would produce (A B C). M-expressions included some other features, for example:

The special infix operator “=” could be used to define new functions, and thus was a synonym for Scheme’s “define”. An example of its expected use was:
```
    third[x]=car[cdr[cdr[x]]]
```
A conditional expression of the form [p1 → e1 ; p2 → e2 ; ... pn → en] evaluated each p left-to-right; where the first is true, its corresponding e is returned. This presumably could map to (cond (p1 e1) (p2 e2) ... (pn en)).

The fundamental problem with M-expressions was that they were not general. When a new syntactic structure was created (e.g., with a macro), the new construct could easily be accessed using s-expressions, but not with M-expressions. Also, M-expressions were never widely implemented; if you wanted to actually use a Lisp-based language, you had to use s-expressions.

Sweet-expressions avoid these problems of M-expressions. The sweet-expression notation is not tied to any particular semantic, and it has been implemented multiple times.

Comparison to Honu

Honu, as described in Honu: Syntactic Extension for Algebraic Notation through Enforestation, is “a new language that fuses traditional algebraic notation (e.g., infix binary operators) with Scheme-style language extensibility. A key element of Honu’s design is an enforestation parsing step, which converts a flat stream of tokens into an S-expression- like tree, in addition to the initial ‘read’ phase of parsing and interleaved with the ‘macro-expand’ phase. We present the design of Honu, explain its parsing and macro-extension algorithm, and show example syntactic extensions.”

In particular, the Honu authors state that their “immediate goal is to produce a syntax that is more natural for many programmers than Lisp notation - most notably, using infix notation for operators - but that is similarly easy for programmers to extend. Honu adds a precedence-based parsing step to a Lisp-like parsing pipeline to support infix operators and syntax unconstrained by parentheses. Since the job of this step is to turn a relatively flat sequence of terms into a Lisp-like syntax tree, we call it enforestation. Enforestation is not merely a preprocessing of program text; it is integrated into the macro-expansion machinery so that it obeys and leverages binding information to support hygiene, macro-generating macros, and local macro binding - facilities that have proven important for building expressive and composable language extensions in Lisp, Scheme, and Racket.” An example of its syntax, per its paper, is:

function quadratic(a, b, c) {
  var discriminant = sqr(b) - 4 * a * c
  if ( discriminant < 0) {
    []
  } else if (discriminant == 0) {
    [-b / (2 * a)]
  } else {
    [-b / (2 * a), b / (2 * a)]
  }
}

At the surface, perhaps the most obvious difference is that Honu uses {} for major structures, in a way that looks somewhat similar to C, instead of using indentation. This means that, like Scheme and C, users must use tools to keep the visual indentation consistent with the {} that are actually used to nest constructs... leading to the risk that they will go out of sync (misleading human readers). Another obvious difference is that Honu supports user-defined precedence levels; as noted in SRFI-105, this causes trouble in dealing with operators if the precedence is defined differently in different code sections, and also makes it more difficult for human readers to determine where lists begin and end.

There are some surface similarities as well. Honu does support a more traditional-looking function call notation, of the form “quadratic(a, b, c)”. Sweet-expressions accept a similar function call format, though without the commas (which we found were annoying in practice, as they were extraneous and interfered with the comma operator). Both Honu and sweet-expressions accept infix notation, which are essentially universally used elsewhere, though with some minor differences in syntax (in part due to Honu’s use of precedence).

But Honu’s major approach is fundamentally different; the syntax is actually embedded with the language, making it difficult to separate the two: “To handle infix syntax, the Honu parser relies on an enforestation phase that converts a relatively flat sequence of terms into a more Scheme-like tree of nested expressions. Enforestation handles operator precedence and the relatively delimiter-free nature of Honu syntax, and it is macro-extensible. After a layer of enforestation, Scheme-like macro expansion takes over to handle binding, scope, and cooperation among syntactic forms. Enforestation and expansion are interleaved, which allows the enforestation process to be sensitive to bindings.” Honu’s approach enables new syntaxes and meanings to be installed, which its authors presumably expect to be a good thing, but this approach also has significant downsides.

Honu’s approach appears to impede generality. For example, {...} is defined as starting “a new sequence of expressions that evaluates to the last expression in the block.” Note that this definition is more than simply the definition of a list in terms of syntax; the notion of how to calculate it seems to be embedded in the syntax. Honu’s approach seems to be at odds with the idea that a notation should be independent of the evaluation approach.

Honu’s approach certainly sacrifices homoiconicity. The whole Honu process invokes macros that can transform the results. What’s more, these macros can be defined later. As a result, it is not possible to know what a syntactic construct means without knowing all the transformation definitions active at the time the construct was read. The precedence definitions for infix operators are an example of this problem, but this turns out to be systemic in Honu. In short, Honu’s approach is at odds with the idea that a human reader should be able to read just that surface syntax, without knowing anything about what macros are active, and still know what exactly what the underlying structure will be.

Another complication with Honu is that it is not backwards-compatible with existing Lisp constructs. In Honu, the “(expression)” production “performs the traditional role of parenthesizing an expression to prevent surrounding operators with higher precedences from grouping with the constituent parts of the expression”. It seems that internally, the base Honu reader does read it in as a single-item list. But the subsequent enforestation step removes any extra layers of parentheses. This semantic is similar to many other languages, but it means that a Honu reader cannot double as a Scheme reader. In contrast, most users could silently switch to a sweet-expression reader and have no idea that a change had occurred, since normally-formatted Scheme expressions will continue to work unchanged. This means it is much easier to transition to sweet-expressions.

Honu’s approach ties together desugaring and macro-expansion; the text “foo(bar, quux)” is two datums, “foo” and “(bar |,| quux)”, and the enforestation step (which doubles as the macro-expansion step) converts it to “(foo bar quux)” at the Racket level. Honu’s macros are not actually the same type as the hosting Racket implementation’s macros. A honu-block Racket macro calls the enforest routine, which then calls Honu-level macros.

Fundamentally, the Honu approach sacrifices both generality and homoiconicity to achieve readability. In addition, its use of {...} creates the risk that visual indentation will be inconsistent with the actual expression structure. We applaud Honu’s goal of readability, but do not believe its sacrifices are necessary to achieve that goal.

Comparison to Q2

An interesting experimental notation, “Q2”, was developed by Per Bothner; see http://per.bothner.com/blog/2010/Q2-extensible-syntax/.

Q2 has somewhat similar goals to the “readable” project, though with a different approach. The big difference is that David A. Wheeler decided it was important to have a general notation for any s-expression. Here is a brief additional comparison:

Sweet-expressions have infix, though not built-in precedence (precedence can be implemented by defining $nfx$ ).
Both have “juxtaposition for function application”
Q2 has “Naming a zero-argument function applies it” but this is awkward, indeed, “The exact rule for a distinguishing between a variable reference and a zero-argument function application isn’t decided yet.” In sweet-expressions, a zero-argument function name is called by adding () after it or around it, e.g., pi().
“Flexible token format” - both require operators to be delimited.
“Use indentation for grouping” - both use indentation for grouping
“Block expressions yield multiple values” - In sweet-expressions, you use usual Scheme procedures, including value, instead of having special syntax.
REPL: In sweet-expressions, you usually end a line with ENTER ENTER. Q2 doesn’t, but Wheeler worries that you have to be careful or it’ll end where it syntactically might not need to.

Comparison to P4P

P4P: A Syntax Proposal by Shriram Krishnamurthi describes an alternative, more readable format for the Racket implementation of Scheme. There are some similarities, but many differences.

P4P supports functional name-prefixing such as f(x), just as sweet-expressions do. However, function parameters are separated by commas (an extra character not typical in Lisp code, and in our experiments something of a pain since parameters are very common). P4P does not support infix notation at all, even though practically all non-Lisp languages support them.

P4P has a very different view of indentation, compared to sweet-expressions. In P4P, indentation does not control semantics. Instead, “the semantics controls indentation: that is, each construct has indentation rules, and the parser enforces them. However, changing the indentation of a term either leaves the program’s meaning unchanged or results in a syntax error; it cannot change the meaning of the program.”

This means that P4P has a large number of special-case syntactic constructs. For example, defvar: and deffun: specially use “=”, if: has intermediate keywords, and so on. While this looks nice when you stay within its set, it encounters the same problem that McCarthy had with M-expressions: There are always new constructs, including ones in meta-languages (not the underlying Scheme implementation) and macros. The P4P author notes that, “it would be easy to add new constructs such as provide:, test:, defconst: (to distinguish from defvar:), and so on”, but this misses the point; the task of defining constructs inhibits the use of those constructs, and may be impractical if there are syntactic differences at different language levels. For example, imagine processing lists where “deffun” has a different definition than the underlying language; this is trivial with s-expressions and sweet-expressions, but not practical using P4P.

The P4P author notes that, “the parser can be run in a mode where indentation-checking is simply turned off... This can be beneficial when dealing with program-generated code.” However, now the developer must deal with enabling various modes, and this mode is needed not just for program-generated code, but for code that has mixtures of various languages. Rather than having multiple modes, a single mode that works everywhere seems more useful to the developers of the sweet-expression notation.

In short, P4P fails to be general; it is tied to specific semantics. Previous readability efforts, such as M-expressions, failed, and we believe that one reason was that those notations failed to be general. We applaud the admirable goals of P4P, but do not think it represents the best way forward.

However, while we believe different design choices need to be made, we applaud the effort. In addition, we believe that P4P is additional evidence that people are interested in improving the readability of Lisp, and that indentation can help do so.

Comparison to Z

The “Z” language by Chris Done (not related to the Z specification language) has been discussed on Reddit, and was reported to the readable-discuss mailinglist by Ben Booth on 2013-01-02. It’s an indentation-based lisp-like language, although the indentation rules differ somewhat from sweet-expressions.

In Z, a whitespace-separated sequence of terms applies to the next, so:

  foo bar mu zot

would parse (in s-expression form) as (foo (bar (mu zot))). As its documentation states, “To pass additional arguments to a function, the arguments are put on the next line and indented to the column of the first argument”

This is an interesting approach, but David A. Wheeler agrees with 1337hephaestus_sc2 on Reddit: “The main idea seems clever, but also too clever.”

Here are a few issues with Z syntax compared to sweet-expressions:

When you have multi-parameter functions, this syntax quickly forces you to grow vertically. This is exactly the opposite of the actual real estate available. Screens are wide and short, and even if you use traditional paper sizes it’s wider than tall (typically 80 characters across, ~66 lines down).
Edits in one line could quietly change the meaning of other lines, in non-obvious ways. If you edit a line with children, you have to make sure that the lines that follow are moved as well. An IDE can do this, but it’s concerning if an IDE is a practical necessity to edit files. Here is an example of this meaning change; if you started with:
```
   fee fie foe fum
               foo bar
```
this would be (fee (fie (foe fun (foo bar)))), but merely changing “fie” to “faction” would produce
```
   fee faction foe fum
               foo bar
```
which would be interpreted as (fee (faction (foe fum) (foo bar))).
It may be especially easy to make a mistake with this notation in a lisp. Writing “cons a b” would seem reasonable enough, but would be interpreted as (cons (a b)).
The notation seems to assume that all characters have the same (or at least predictable) width, an assumption that is much more difficult to ensure in a multi-lingual world with multiple encodings, variable-width fonts, and a much richer set of characters.

Comparison to Genyris

Genyris is another indentation-based Lisp. “All Genyris expressions are parsed and stored as linked-lists. A single line is converted into a single list. Sub-expressions are denoted in two ways, either within parentheses on a single line, or by an indented line. For example the following line contains two sub-expressions:

Alpha (Beta Charlie) (Delta)

“Sub-expressions made using parentheses must remain within a single line, they are not [normally] permitted to wrap. Indented lines are deemed to be sub-expressions of the superior, less indented, lines above. The above expression can be written in indented form as follows:”

Alpha
Beta Charlie
Delta

Thus, it is similar to the main rule of t-expressions, except that Genyris wraps “ALL sublines in lists, even if they consist of a single element.” As Beni Cherniavsky-Paskin notes, “It can get away with that simpler rule because all data objects are callable and eval to [themselves]... In fact it’s much cleverer, though that’s irrelevant for us. All objects are actually macros (“lazy functions” in the manual’s terminology). What objects do if called with arguments - e.g. (“foo” arg1 arg2) - is evaluate those arguments in a dynamic-binding env enriched by the object’s methods, and return the last value. Dynamic scope only affects names starting with a dot, other names use lexical scoping. All this forms a clever implementation of method calling:

"ball" (.replace "l" "na")
"banana"

On 2013-05-23 Bill Birch (the creator of genyris) posted on the SRFI-110 mailing list, clarifying some points. He said:

“I have made some syntactic decisions which I regard as restricting programmers to write in a better style. For example not allowing lists to wrap encourages... smaller functions. That’s OK for source code, however when loading data files one should not restrict the structure. One difficulty with a syntax that defaults to lists is that something special needs to be done for atoms.

[For] example (a b c (d e f) xx) is problematic since xx is subordinate but is not a list. So in Genyris I was forced to add a leading ‘continuation’ character ~. Which gives me:
  quote
:   a b c
:      d e f
:      ~xx
:
(a b c (d e f) xx) # PairSource

There is another (obvious) way to wrap lists in Genyris just place a = at the end of the line. [For] example:
  list 1 2 3 4 5 =
:    6 7 8
:
(1 2 3 4 5 6 7 8) # Pair

In practice I don’t often use line continuation in code.

David A. Wheeler believes the genyris notation is interesting but is less readable for general-purpose s-expressions. In particular, this approach makes it more difficult (and less readable) to notate simple atoms, which are very common in typical Lisp code. This may be less important in genyris, which is a significantly different language. Indeed, the author of genyris stated, “if you use indented syntax expect the language itself to change!”. But while the genyris notation may work well for the genyris system, David A. Wheeler believes a different notation would be better-suited for other cases. The genyris approach does (again) demonstrate that there is interest in using syntactically-relevant indentation in a Lisp-like language.

Comparison to the “Initial Arne formulation”

On 2013-02-08, Arne Babenhauserheide made an alternative indentation proposal and posted it on the readable-discuss mailing list.

Aside from the basic indentation-means-subitem, it has the following important points:

The marker “:” indicates that an indentation is explicitly placed at the column where that marker is. That is, you might conceptually consider it as ending a line, then inserting an indentation to that column position, followed by the text after the :. As a precis, a : on an indented line by itself is a placeholder indicating an indentation at its column position, similar to our GROUP \\ marker. For example, the following are equivalent:

Arne formulation	Basic indentation format	s-expression
let : : x : compute 'x : y : compute 'y use x y	let : x compute 'x y compute 'y use x y	(let ( (x (compute 'x)) (y (compute 'y))) (use x y))

A single datum on a line by itself without a child line is a single-item list; this is unlike in SRFI-49 or this SRFI, where a single datum on a line by itself without a child line is just that datum.
Arne formulation s-expression
foo (bar) 5 #f
(foo) ((bar)) (5) (#f)
The marker “.”, when it starts a line, splices the list after it into the parent list. This is primarily used to turn the single-item lists formed by the previous rule into actual single datums.
Arne formulation s-expression
foo bar . 5 . #f #t "hello"
(foo (bar) 5 #f #t "hello")
Inconsistent dedents are accepted. For example, the following text is accepted in Arne’s formulation, but would be rejected as an error by this SRFI:
Arne formulation s-expression
foo bar quux kuu nitz
(foo (bar quux) (kuu nitz))

Arne formulation	s-expression
foo (bar) 5 #f	(foo) ((bar)) (5) (#f)

Arne formulation	s-expression
foo bar . 5 . #f #t "hello"	(foo (bar) 5 #f #t "hello")

Arne formulation	s-expression
foo bar quux kuu nitz	(foo (bar quux) (kuu nitz))

After being proposed, it was suggested that the rule 2 above should be amended to be similar to equivalent rules in SRFI-49 and this SRFI; that is, a single datum on a line by itself should be only that datum, not wrapped in a list. Further, a “.” marker followed by a single datum without a child line should be a no-op.

Rule 2 was formulated that way since the intention was to build an indentation processor, not a full parser. However, further discussion revealed that a simple rule could be formulated to differentiate between one-item and two-item lines; specifically, a space outside of parentheses or strings indicated that the line had two or more items. Thus even a simple indentation processor could support SRFI-49-like rule 2.

This proposal was initially quite attractive (at least to Alan Manuel K. Gloria). It is simpler to describe informally, and appears, at first glance, to replace many actual uses for GROUP/SPLIT, SUBLIST, and collecting lists. Thus, it was hoped that these three extensions could be removed with the simpler : marker rule.

However, there are use cases where SUBLIST has superior semantics over Arne’s :. For instance, consider the following SUBLIST code:

call/cc $ lambda (exit)
  body
  ...

Replacing this with Arne’s : requires further indenting the body to after the : marker.

call/cc : lambda (exit)
            body
            ...

With Arne’s formulation, a trade-off exists: either (1) add a separate line for the lambda (which increases vertical lines in exchange for reduced indentation), or (2) use : (which increases horizontal indentation in exchange for reduced vertical lines).

either (1)	or (2)
call/cc lambda (exit) body ...	call/cc : lambda (exit) body ...

SUBLIST is powerful precisely because it collects child lines. This allows you to simultaneously reduce horizontal indentation and vertical lines.

The : and . markers are also insufficient replacements for GROUP/SPLIT. At first glance it might seem that . is superior to the SPLIT meaning of \\:

Arne’s formulation	sweet-expression
export . api-init api-use api-close	export api-init \\ api-use \\ api-close

But we expect that more typically, you want to express the code that looks like this:

Arne’s formulation
begin . (display "Welcome, ") (display player) (display ", to Chicago!") (newline)

This can be expressed, more cleanly, in sweet-expressions:

sweet-expression
begin display "Welcome, " \\ display player \\ display ", to Chicago!" \\ newline()

If you truly want several single items to be spliced, the following trick takes advantage of the fact that indentation processing is disabled inside parentheses:

export . (
  api-init api-use api-close
)

Arne’s formulation also does not have a method to conveniently express a single gigantic top-level datum that contains several complex sub-datums, a.k.a. the define-library problem.

<* define-library \\ (example)
import (scheme base)
export . (
  example-init
  example-open example-close
)
<* begin

define example-init()
  whatever ...
  ...

define example-open(x)
  whatever ...
  ...

define example-close(y)
  whatever ...
  ...

*>; begin
*>; define-library

We could retain collecting lists, and live without the SPLIT behavior, or even SUBLIST, though these would be important losses. Conversely, they could be re-added, but at that point, its simplicity has completely disappeared. But these ignore the biggest problem.

The most important problem with this proposal is that it falsely assumes that it’s possible to know the visual width of different characters. In today’s world, this is impractical, especially across the many different implementations of Scheme and other Lisps.

Most obviously this presumption is false on systems with variable-width fonts, and these are widely used for email messages. You simply cannot presume you know anything about the actual widths of different character sequences in this case.

Even when only Western symbol sets are used, some letters can or must be expressed using combining characters. In these cases, what is stored as two characters are supposed to be displayed as one.

For another example, some East Asian characters, called fullwidth characters, should be displayed on two columns even on a fixed-width font display. In Arne’s formulation, the width of non-whitespace characters is significant, since the : marker can record the column position after non-whitespace characters occur. This SRFI, on the other hand, requires recording only the column position of horizontal whitespace characters; we handle the different possible widths of the TAB character by requiring consistent indentation.

Arne’s formulation requires either that implementations know all fullwidth characters (a much longer list than the list of horizontal whitespace characters), or would leave handling of fullwidth characters up to implementations, meaning that indentation expressions have potential portability problems.

Granted that almost all code will not utilize symbols containing fullwidth East Asian glyphs, one must consider strings containing fullwidth East Asian glyphs, which we expect to occur regularly in East Asia.

This also brings the issue of character encoding. To properly recognize fullwidth characters, the encoding must be known. Granted, many East Asian-specific encodings use two bytes for fullwidth characters, and one byte for halfwidth characters. So a simple byte-as-character interpretation would keep track of column positions correctly, if you are using such a East Asian-specific encoding. Until you re-encode the text into UTF-8.

UTF-8 use is spreading; it can encode any Unicode code point, and is largely back-compatible with ASCII. But East Asian fullwidth characters do not necessarily encode in two bytes in UTF-8. Not to mention that many more characters in UTF-8 are encoded in 3 or more bytes but do not take 3 or more columns, just one. Even if these characters do not not occur in identifiers, the characters can occur in strings, and such strings might usefully be placed before a : marker.

If we are sensitive to only initial indentation, then we need only worry about the widths of two characters, TAB and SPACE (and ! for this SRFI). This causes no problems in this SRFI, because indentation is required to be consistent across lines. In contrast, in Arne’s proposal, we need to worry about the widths of every character, and also know the encoding. Scheme code (and Lisp code in general) will increasingly need to embed strings with international (non-ASCII) characters, and R7RS at least allows optional support for symbols that contain international (non-ASCII) characters. R6RS mandates that support.

After a long discussion, this proposal was turned down by the authors of this SRFI.

Comparison to “Whitespace to Lisp” (wisp)

Based on feedback on the readable-discuss mailing list, Arne Babenhauserheide refined his initial approach and developed “Whitespace to Lisp” (wisp). You can learn more about wisp at the wisp: Whitespace to Lisp web page.

The key distinguishing factor between wisp and sweet-expressions is a difference of emphasis. The design of wisp focuses on having a simple definition as the most important factor. The design of sweet-expressions focuses on having simpler and clearer code that uses the notation (having “readable” code), even for large code sizes, at the cost of having have a slightly more complex definition. Below is a more detailed discussion of these differences.

As with the initial approach, wisp includes indentation-means-subitem (as is true for sweet-expressions). Like the initial approach but unlike sweet-expressions, every line is a new list unless prefixed with “.” after the indentation, and blank lines do not terminate the expression.

The main change in wisp compared to the Arne’s initial approach is that sublist with inline (“:” in wisp notation) always ends at the end of the line (thanks to feedback from Alan Manuel K. Gloria). This completely eliminates the width-of-characters problem of the initial proposal (see below). At the time of this writing, inconsistent dedents produce broken code in the wisp implementation (they should throw an error instead). However, this is an implementation issue and not fundamental to the notation.

This means that some examples change. As Arne Babenhauserheide explains, first, you should not use double colons, so instead of:

  let : : x : compute 'x
        : y : compute 'y
      use x y

In wisp this would be:

  let 
    : 
      x : compute 'x
      y : compute 'y
    use x y

In comparing to SUBLIST, you could write it this way:

  begin
    . (display "Welcome, ") (display player) (display ", to Chicago!") (newline)

Alan Manuel K. Gloria believes you might indeed write it that way. In contrast, Arne believes it would be more likely be written this way:

  begin
    display "Welcome, "
    display player
    display ", to Chicago!"
    newline

Or like this, if you want more complex expressions (note that in wisp, newline is not followed by () as it would be in sweet-expressions):

  begin
    display 
      concat "Welcome, " player 
           . ", to Chicago!"
    newline

Arne states, “This uses more vertical space - and I don’t mind (so it is a design choice not to try very hard to minimize vertical space).” For the single gigantic top-level datum, you just indent the rest (as in class definitions in Python):

library : example
  import : scheme base
  export . 
    example-init example-open example-close
  begin
    define : example-init
      whatever ...
      ...

    define : example-open x
      whatever ...
      ...

    define : example-close y
      whatever ...
      ...

Note the similarity to the similar Python syntax:

class example:
  def __init__():
    import base
    print base.stuff
    def localfunc():
      whatever ...
      ...

In wisp, expressions do not end on blank lines, and this could cause significant problems in interactive use. Wisp’s creator said on 2013-04-16, “Not breaking at an empty [aka blank] line is a design choice, because I tend to often use empty lines in python-code to separate logical parts of a function. But this actually creates problems when pasting a python script file into an interactive shell, so I see the point. Note, though, that there is no interactive wisp shell (yet). I think I would end a statement at 2 empty lines, which would mean that you should not use 2 consecutive empty lines in a script (but you could).” Using two blank lines is a plausible alternative, but note that this makes interactive use less appealing; users would actually have to press Enter three times (once to end a line, and twice more to create two blank lines).

Wisp does not have a mechanism to restart at the left-hand-side for long stretches of code, as sweet-expressions does, and this omission raises some concerns. Alan Manuel Gloria notes, “In Scheme, usually you just put a bunch of definitions (unindented) in a file, then load them in your favorite Scheme system. After you’ve hacked on the definitions on the file a bit, then you put the module annotations. This is largely the rationale for (include ...) in R7RS (define-library ...) forms: the expected Scheme workflow is to start with a bunch of top-level, non-module definitions, hack on them until they work, then put them in a module. Hence, support for a bunch of unindented definitions inside a module would be nice. This is largely due to history: Scheme did not have cross-system standard modules. Most coders will have two or so Scheme systems they work in, and they might want to hack on their code first on one, then on the other(s). A flat file of definitions would usually work portably across Scheme systems. So Schemers generally have the habit of putting module annotations as the last step just prior to publishing their code. Those interested in cross-Scheme compatibility for their published code might very well keep the definitions unindented - some Schemes require modules to be a single large datum (MzScheme, R6RS, R7RS) others require module annotations as a separate datum(s) before definitions (Guile), and a Schemer maintaining a cross-platform library might get bug reports from different segments of their users - including patches.

The maintainer could use the “ignore whitespace change” options to patch, but this would “mess up the indentation afterwards if you applied an indented patch into an unindented source (entirely new lines in the code would be indented more than their surroundings) or vice versa. So it’s less ideal for the maintainer, since applying patches becomes more complex. [It is] simpler to just start all defines at indent 0, and for code in a module-is-one-datum system, just wrap all the defines in the module annotation without disturbing their indentations.” By keeping their published code unindented, such a maintainer could apply the same patch, from say a primarily-Guile user, to both the official Guile and MzScheme code.” Again, Wisp has no mechanism to do this.

Arne clarifies about the wisp notation: “Just to also state it explicitly: By making inline : close at the end of the line, the width-of-characters problem disappears: The only thing which can come before a colon that defines an indentation level which is relevant to later lines are spaces and space-equivalents (blocks of underscores starting at the beginning of the line). I did not see a drop in readability due to limiting inline : to the present line - rather the opposite, as it prompts me to do more tail calls. And clarity definitely increased. Just compare those two:”

  let : : x : compute 'x  ; First block
        : y : compute 'y
      use x y

  let                     ; Second block
    : 
      x : compute 'x
      y : compute 'y
    use x y

Arne stated, “I would not be sure at first glance myself what the first block does. For the second it’s clear to me.”

As noted, the wisp approach has no problems dealing with character widths and encodings (including double width characters), due to the change in : semantics. This removes a major objection to the initial Arne formulation.

However, the wisp approach prevents the shorter let syntax which requires the “: keeps track of column” semantics. In sweet-expressions, <*...*> makes it easy to handle short let syntax (which occurs often in Scheme and other Lisp code) and very long module definitions; no similar mechanism exists in wisp.

Another distinction between wisp and sweet-expressions is in how they handle singletons:

In wisp, every line normally creates a new list. You must use initial “.” to identify singleton values.
In sweet-expressions and SRFI-49, a line with just its value and no child lines represents itself. You must surrounding or append the value with parentheses to create a list with a singleton value.

David A. Wheeler, the creator of sweet-expressions, believes that the sweet-expression/SRFI-49 approach (where singletons without children mean themselves) is the better approach, because he believes that:

It produces simpler code. Common cases should be simpler than less-common cases. Many procedures involve case/cond statements that break a problem into components, eventually leading to simple base cases. These base cases are often simple expressions (e.g., a number, the empty list, and so on) that are not in a list. Lists with single elements occur, e.g., “(newline)”, but they are less common. Wheeler believes we should optimize for the more common case.
It is more familiar to Lisp developers. A “.” in a list traditionally means that the following item is the cdr of a list, potentially creating an improper list. In the wisp approach, it may also indicate a normal list element, which is potentially confusing. You may even have sequences of lines leading with “.”, with a completely different meaning from traditional Lisp. For example, here is a wisp expression:
```
  if condition?   ; wisp example
    . #t
    . #f
```
The same could be expressed using sweet-expressions as:
```
  if condition?   ; sweet-expressions example
    #t
    #f
```
It is more familiar to non-Lisp developers. Developers who have never even seen Scheme can correctly guess what “newline()” means, while “. 1” will require explanation.
It is less error-prone. Because it looks less familiar, developers are more likely to write the wrong code and miss errors with the wisp formulation. This is especially because “.” has another meaning (it sets the cdr of a list). Even Arne, developer of wisp, agreed on 2013-05-05 on the SRFI-110 mailing list that having to prefix singletons with “.” (as wisp requires) is “a trap: It’s easy to forget the . for a return value... Not adding brackets for a single item also has the advantage, that you can copy-paste lisp-code into readable. If you do the same in wisp, you have to prepend every top-level bracket with a dot. Also, a “.” in front of an integer looks very much like a floating point number with initial period, again inviting error.

Wisp emphasizes a simple notation, but this can make certain constructs more complicated (and less readable) compared to sweet-expressions. E.G., this sweet-expression:

  define me()
    you now
    others now
    others past
    '()

Can also be written in sweet-expressions to conserve vertical space as:

  define me()
    you now \\ others now \\ others past \\ '()

This could be written in wisp, but it would have to be written using a longer format such as:

  define : me
    you now
    others now
    others past
    . '()

A minor problem with wisp is that it uses “:” as an operator. While “$” is relatively uncommon as an operator, “:” is more common (e.g., for type declarations in typed Racket). It can be escaped, but ideally the notation should only rarely require escaping; in notations like typed Racket, type declarations are common.

Wisp, by itself, does not support infix operations such as {a + 1}, nor does it support neoteric expressions such as f(x) to abbreviate (f x). In theory these could be added to wisp, but they are not in its definition at the time of this writing.

It should also be noted that, at the time of this writing, there is much less experience with wisp. No larger program has been written with wisp, and there is only one implementation of the wisp notation (in Python). In contrast, at least two programs have been written using sweet-expressions (sweeten and letterfall), and a large number of smaller expressions have been written for a variety of Lisp variants. Sweet-expressions have also been implemented at least four times (original implementation in Scheme, ANTLR implementation, the reference SRFI implementation in Scheme, and a Common Lisp implementation loosely derived from the SRFI implementation). There is also no BNF of wisp notation at this time; in contrast, sweet-expressions has a BNF that has been automatically checked (by ANTLR).

Both wisp and sweet-expressions are designed to create a more readable Lisp. Both are homoiconic, generic, and backwards-compatible. Indeed, this is very friendly competition; Arne Babenhauserheide has provided helpful commentary on sweet-expressions, and Arne Babenhauserheide reported that feedback has been very helpful: “Without the discussions here, there would be no wisp implementation” (2013-04-16).

Perhaps the strongest distinguishing factor is a difference of emphasis. The design of wisp focuses on having a simple definition as the most important factor. The design of sweet-expressions focuses on having simpler and clearer code when using the notation, even for large code sizes, at the cost of having have a slightly more complex definition (because it includes additional abbreviations). Several of the sweet-expression BNF branches were specifically added so that the notation would “do what people expect”, and capabilities like collecting lists are designed to handle large structures (e.g., long module definitions). We believe that since developers must routinely read and write a lot of code, and that some programs will be large systems with many modules, it’s worth having a slightly larger notation with additional capabilities to support them.

Closing SUBLIST by unmatched dedent (“Beni Formulation of SUBLIST”)

On 2013-02-18, Beni Cherniavsky-Paskin proposed an extension of SUBLIST semantics, to “allow closing SUBLIST by [partial] dedenting”. Informally, in Beni’s proposed extension, any occurrence of SUBLIST would mark a fresh indent level, which could be matched by an otherwise-unmatched dedent. For example:

Extended SUBLIST	Equivalent
outer1 outer2 $ inner1 ! ! inner2 ! outer3	outer1 outer2 ! inner1 ! ! inner2 ! outer3
let $ ! ! x $ compute 'x ! ! y $ compute 'y ! use x y	let ! \\ ! ! x $ compute 'x ! ! y $ compute 'y ! use x y

The original formal description by Beni Cherniavsky-Paskin, as expanded by Alan Manuel K. Gloria, involves moving SUBLIST and SPLIT processing from the parser to the indentation preprocessor (i.e. the part that inserts INDENT and DEDENT tokens). In the current specifications, the indentation preprocessor handles a stack of indentations (in the implementation, a cons-cell stack of strings). Beni’s formulation expands this stack to include the special indentation marker ?. In the succeeding formal description, we assume two variables, the indentation-stack and current-indentation.

On encountering a SUBLIST, consume the SUBLIST and emit INDENT. Push ? on indentation-stack.
On encountering an inline GROUP/SPLIT (i.e. SPLIT meaning), consume it, then:
1. If indentation-stack’s top is ?: Pop off every ? on top of indentation-stack and emit DEDENT for each popped item.
2. Otherwise, emit SAME.
On encountering an EOL, consume it, then consume indentation whitespace ((TAB | SPACE | !)*) and put it in current-indentation. Then:
1. If the indentation-stack’s topmost non-? item is “not consistent” with current-indentation, signal a bad indent error (BADDENT).
2. If the indentation-stack’s topmost non-? item is less than current-indentation, push current-indentation on indentation-stack and emit INDENT.
3. If the indentation-stack’s topmost non-? item is equal to current-indentation: (note: this is a copy of 2.1 and 2.2 above)
  1. If indentation-stack’s top is ?: Pop off every ? on top of indentation-stack and emit DEDENT for each popped item.
  2. Otherwise, emit SAME.
4. Otherwise, the indentation-stack’s topmost non-? item is greater than current-indentation:
  1. Pop off stack items until indentation-stack’s topmost non-? item is less than or equal to current-indentation; emit a DEDENT for each popped item.
  2. If the indentation-stack’s topmost non-? item is equal to current-indentation, pop off all ? and emit a DEDENT for each.
  3. Otherwise, if the indentation-stack’s top is ?, pop it off and push current-indentation on the stack.
  4. Otherwise, this is a DEDENT that is not matched by an earlier INDENT and is not matched by an earlier SUBLIST, so signal an error (BADDENT).

This extension of SUBLIST turns out to be backward-compatible with the current SUBLIST semantics, in the sense that any SUBLIST-using text constructed using the current SUBLIST semantics would have exactly the same meaning in Beni’s extended SUBLIST semantics. This is a significant advantage as it means we can apply this extended rule at any future time without fear of breaking existing code.

Alan Manuel K. Gloria was excited with this proposal, and considered it superior to his original SUBLIST formulation, but David A. Wheeler was much more reserved. The following concerns were noted about this formulation:

It complicates explanation of “$” and is more difficult to describe informally. If we used this semantic, some people would require a second explanation of SUBLIST (“$”) that is essentially identical to the current description here. Every time we add a complication, we risk losing some potential users and implementers.
We leave better-understood parsing theory if this is added. Existing approaches tend to follow Python or Haskell approaches and specifically consider the actual source stream to have matching indentations and dedentations. We want to have this easily implemented, with many reasons to be confident that it is well-designed; the more we leave established theory, the harder it is to do that. David A. Wheeeler in particular wanted to make sure that the constructs are clearly and unambiguously defined as part of some well-checked BNF grammar.
It complicates the definition of the notation and weakens error-checking for correctness of the notation. Moving handling from the parser to the indentation preprocessor meant that many tools for proving parser correctness (i.e. ANTLR) could not be used on the extended handling. We want this notation to work “because it’s clearly correct”; using ANTLR to check it rigorously is a valuable way to get there. In addition, the formal rules for this extended SUBLIST are difficult to reason about (“(pft)... That’s the sound of my head exploding”).
It complicates the implementation.
This partly disables error-checking for code that uses sweet-expressions. With this, incorrect indentation after uses of SUBLIST become a potential source of silently passed mistakes.
It can be viewed as complicating the reading of code that uses it. Up to this point, a dedent always ended the whole line above; now it can end it a part. It is unclear that the reduction in line count is fair compensation.
It’s not clear (at least to David A. Wheeler) that there’s enough value to adding it. “There *ARE* use cases, and these use cases are definitely common enough to discuss doing something special with them. But I worry that the contravening downsides will overwhelm it. Currently, in certain cases we have to add ‘\\’ -only lines; that’s not really a hardship, especially since the resulting constructs are pretty easy to understand.” In particular, the given let example above remains (as of the time of this writing) the only significant use case for Beni’s extended SUBLIST formulation, and there are already other relatively-painless ways to handle this construct.
This “partial dedenting” approach is backwards-compatible with the current specification, and thus could be added later if desired.

David A. Wheeler mentioned the possibility of using a PARTIAL_DEDENT token so that full Beni formulation of SUBLIST could be handled completely in the parser. This possibility has not been explored fully as yet. It may be explored if further use cases for the full Beni formulation are found in the future.

Alan Manuel K. Gloria continues to hold out hope that this extended formulation will get more use-cases, but decided not to press for immediate inclusion in this SRFI.

Beni Cherniavsky-Paskin’ himself noted that this proposal is “a backward-compatible extension to SUBLIST (similarly applicable to any competing FOOLIST semantics), so we could leave it undecided for now, and legalize it later...”. For the moment, that is what we have done; we have ensured that it could be added later if turns out to be important to do so.

Variation: Closing end-of-line SUBLIST by unmatched dedent (“Beni-Lite”)

On 2013-02-23, David A. Wheeler counterproposed (for purposes of experimentation) a subset of Beni Cherniavsky-Paskin’s proposal. He christened the approach “Beni-Lite”, and included a sample implementation using ANTLR and its BNF. This was eventually rejected, but we believe it’s important to document this approach - in part because it could be added later if desired.

In this alternative, a “$” can be closed by an unmatched partial dedent, but only if the “$” is at the end of a line and there is other text besides any indentation characters. The primary argument given for this variant is that it covers the primary use cases David A. Wheeler had seen, and it is possible to formulate this limited variant while continuing to use ANTLR’s grammar checking. It also retains stronger run-time input checking; partial dedents are only legal when including “$” at the end of the line, making them unlikely to use accidentally. It is still complicated, but it is not much more complicated than notations without unmatched dedents.

Here are some sample test cases to demonstrate its impact:

Original Input	s-expression
let $ ! ! var1 value1 ! body...	(let ((var1 value1)) body...)
let $ ! ! var1 value1 ! ! var2 value2 ! body...	(let ((var1 value1) (var2 value2)) body...)
let $ ! ! var1 value1 ! ! var2 value2 ! ! var3 value3 ! body1 param1 ! body2 param2	(let ((var1 value1) (var2 value2) (var3 value3)) (body1 param1) (body2 param2))

Original Input

s-expression

let $
! ! var1 value1
! body...

(let
  ((var1 value1))
  body...)

let $
! ! var1 value1
! ! var2 value2
! body...

(let
  ((var1 value1)
   (var2 value2))
  body...)

let $
! ! var1 value1
! ! var2 value2
! ! var3 value3
! body1 param1
! body2 param2

(let
  ((var1 value1)
   (var2 value2)
   (var3 value3))
  (body1 param1)
  (body2 param2))

The sample implementation tweaked the indent processor so that if a dedent doesn’t match the parent indent, it generates DEDENT followed by a RE_INDENT. Here is an example of how the modified indent processor could tokenize its input:

Original Input	Tokenized version
let $ ! ! var1 value1 ! body...	let SUBLIST EOL INDENT var1 value2 EOL DEDENT RE_INDENT body...

The BNF was then changed so that SUBLIST allowed more constructs:

it_expr
  : head
    ...
     | SUBLIST hspace* /* head SUBLIST ... case */
       (sub_i=it_expr {(append $head (list $sub_i))}
        | comment_eol indent sub_b=body
          ( re_indent partial_out=body
             {(append (append $head (list $sub_b)) $partial_out)}
           | /*empty*/ {(append $head (list $sub_b))} ) )
  ...
  | SUBLIST hspace* /* "$" first on line */
    (is_i=it_expr {(list $is_i)}
     | comment_eol indent sub_body=body {(list $sub_body)} )

However, Alan Manuel Gloria reviewed it and stated that, “I think that, conceptually, having a limitation is an additional complication when teaching the notation... Granted we could just mandate these patterns, but I worry that we are now slipping into the ‘notation is tied to underlying semantic’ bug. Or in this case, ‘notation is tied to underlying legacy syntax’. I’d rather have the full Beni formulation of SUBLIST or the classic 0.4 formulation, in that preference order. I’ll admit that I don’t have a use for the full Beni formulation other than for let, though. I suspect there may be further use cases; but I haven’t found any others yet.”

The current notation does not support either approach at this time; a future version could add these capabilities.

Experience using and implementing sweet-expressions

At least two programs have been written using sweet-expressions:

sweeten by David A. Wheeler is a program that reads traditionally-formatted s-expressions and writes sweet-expressions. This program performs a great deal of traditional list processing, and is part of the “readable” project’s git repository.
letterfall by Alan Manuel K. Gloria is a graphical real-time touch typing game to improve typing skills, which uses GNOME libraries.

The SRFI authors believe that the existence of these programs - written by two different people for different application areas - shows that sweet-expressions are mature enough to be standardized.

In addition, the older paper Sweet-expressions: Version 0.2 (draft) created sweet-expressions versions of a variety of expressions in a variety of Lisp-based languages, to (1) ensure that the sweet-expression notation is general (not tied to some specific semantic), and (2) show that it is relatively easy to notate common constructs in sweet-expressions. Sweet-expressions were developed for expressions in Scheme, Common Lisp, Arc, ACL2, PVS, s-expression BitC, AutoCAD Lisp (AutoLisp), Emacs Lisp, SUO-KIF, Scheme Shell (Scsh), GCC Register Transfer Language (RTL), MiddleEndLispTranslator (MELT), Satisfiability Modulo Theories Library (SMT-LIB), NewLisp, Clojure, and ISLisp. (Clojure currently uses {...} for a different construct, but sweet-expressions could still be used for Clojure.) This demonstration provides evidence that the sweet-expression notation is sufficiently general and expressive.

The sweet-expression notation itself has been implemented at least twice; one in ANTLR (an LL(*) parser generator) and one in Scheme (as a recursive descent parser). Since it has been implemented two different ways, it is less likely to be extremely difficult to implement. The ANTLR grammar itself has been checked by ANTLR’s grammar checker for ambiguities and other problems. Also, ANTLR confirms that the given BNF grammar is LL(1). These implementations, and the ANTLR checking, suggest that this notation is not too difficult to implement and eliminates the risks of certain kinds of grammar flaws. These implementations have been peer reviewed. In addition, they have passed various test suites; the Scheme implementation in particular has passed a test suite with hundreds of test cases.

The Readable Lisp S-expressions Project developed these notations and implementations of them. In particular, the project distributes the programs unsweeten (which takes sweet-expressions and transforms them into s-expressions) and sweeten (which takes s-expressions and transforms them into sweet-expressions), as well as other related tools.

Style guide

Here are some style guidelines that may help you create easy-to-read sweet-expressions, based on the Readable project style guide.

Use indentation for major program/data structure

In general, use indentation to make it easy to see the larger-scale structure of a program or data. Typically major structural atoms should start a new line, including defining a new term (e.g., “define” and “let”), conditionals (e.g., “if” and “cond”), and loops (e.g., “loop”).

Use infix notation

If the function is typically written as infix (including “+”, “*”, “or”, and “<”), use {...} to write it as an infix value. Generally these operators will be “and”, “or”, or an operator that only uses punctuation. If you’re calling a function with only one parameter, and that parameter is calculated with an infix operation, use the f{...} shorthand.

However, you may want to keep using prefix form if indentation still matters and one or more of the parameters is exceedingly complex (e.g., it’s nested very deeply or includes program structuring forms like “cond” and “define”). This situation can often occur with “and” and “or” if you’re using a functional programming style.

Use function call notation for parameters if they fit in a line

If parameters will easily fit on a line if you use function notation such as y(z), then use it. When you’re calling a procedure with no parameters, use function call format with “()” at the end, e.g., “f()”. Use -( ... ) to negate something.

If you are providing a list of data (and not performing a function/method call), then use the traditional list notation such as “(a b c)”. This is exactly equivalent to “a(b c)”, but expressing it as a list will give the human reader a hint that this data is not considered a potential program. If it’s used as both data and as program, then consider it a program, and use function call notation.

In general, indentation is used for the major “structural” elements of a program, and function calls get used once you’re “near the leaf” of structure (where you won’t go beyond the end of the line).

Avoid unnecessary parentheses

Where it’s understandable, don’t include unnecessary parentheses. In particular, when indentation processing is active, the name of the function is right after the indent, and there are no child lines, simply state the function followed by space-separated parameters.

Both SUBLIST ($) and GROUP/SPLIT (inline \\) allow some limited freedom in laying out the program text without disabling indentation processing; feel free to use them. For example, in a cond construct, you can combine on one line a clause’s test and expression by separating them with $. Similarly, the common sequence “(define (f x) (cond ...))” can be represented by putting define and cond on one line and putting $ before cond. Below are some examples that we consider to be quite clear:

define polymorphic-function(a) $ cond
  type1?(a) $ handle-type1 a
  type2?(a) $ handle-type2 a
  type3?(a)
    display "type3 handling not yet fully operational\n"
    log-possible-error a
    handle-type3 a
  type4?(a) $ cond ; cond-in-cond - very clear
    type4-subtype1?(a) $ handle-type4-subtype1 a
    type4-subtype2?(a) $ handle-type4-subtype2 a
    else               $ error 'polymorphic-function "impossible!" a
  else      $ error 'polymorphic-function "unrecognized type" a

define probe(x)
  display "probe: " \\ write x \\ newline()

define buggy-function(a) $ probe $ let ()
  define buggy-sub-function(b) $ short-call b
  body
  ...

define func-w/return(a) $ call/cc $ lambda (return)
  body ... return(whatever) ...

Indentation

Use a consistent amount of indenting for each level. We tend to use 2 spaces for indentation; indentation nesting is more common in sweet-expressions, so 8-character indentations are often too much. Some people prefer to have a larger indent to line up with a parameter in certain cases (e.g., with “if”, put the condition after the if, and line up the branches with the condition).

Consider using “!” followed by space if you’re using a medium that hides indentation, or want to highlight a particular vertical group. However, beware if you start a paired expression and let it continue to the next line; the “!” is not an indent character inside parentheses, braces, or brackets.

Width

You should probably stick to an 80-character width for program text.

Reference implementation

The reference implementation is portable, with the exception that Scheme provides no standard mechanism to override the built-in reader. An implementation that complies with this SRFI must at least activate this behavior when they read the #!sweet directive followed by whitespace.

The reference implementation is SRFI type 2: “A mostly-portable solution that uses some kind of hooks provided in some Scheme interpreter/compiler. In this case, a detailed specification of the hooks must be included so that the SRFI is self-contained.”

The reference implementation includes an entire reader re-implementation that supports SRFI-105, as well as headings to improve its portability among Scheme implementations. Thus, it can be directly used to implement sweet-expressions.

This reference implementation never needs to unread characters. It was written this way for portability, because there is no standard Scheme mechanism for multi-character lookahead (e.g., a multi-character unread). A sweet-expression implementation could be simpler if its underlying platform supported multi-character lookahead. For example, multi-character lookahead simplifies comparing indent levels, as well as looking after the # to determine what follows it (e.g., if it is #;, and if it is, whether or not it is followed by whitespace). Many Scheme implementations and text editor infrastructures support multi-character lookahead, and in those cases their implementations could be even simpler. That said, note that sweet-expressions do not require multi-character lookahead for a fully-correct implementation.

See the Scheme source code for the reference implementation.

References

The readable project website has more information: http://readable.sourceforge.net

Acknowledgments

We thank all the participants on the “readable-discuss” and “SRFI-105” mailing lists, including Kartik Agaram, Arne Babenhauserheide, Eduardo Bellani, Ben Booth, Per Bothner, Beni Cherniavsky-Paskin, John Cowan, Shiro Kawai, Jos Koot, Alpheus Madsen, Egil Möller, John David Stone, Neil Toronto, Mark H. Weaver, Niklas Ulvinge, David Vanderson, and many others whose names should be here but aren’t.

Copyright

Permission is hereby granted, free of charge, to any person
obtaining a copy of this software and associated documentation
files (the "Software"), to deal in the Software without
restriction, including without limitation the rights to use, copy,
modify, merge, publish, distribute, sublicense, and/or sell copies
of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY
OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE
WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Editor: Mike Sperber

Status

Backus-Naur Form (BNF) conventions

Line and indentation handling

Lexing advanced features

Other requirements

Basic approach

General and homoiconic formats

Is it impossible to improve on s-expression notation?

Why should indentation be syntactically meaningful?

What is the relationship between sweet-expressions and SRFI-49 (I-expressions)?

Why are sweet-expressions separate from curly-infix and neoteric-expressions as defined in SRFI-105?

Writing out results

Backwards compatibility (well-formatted s-expressions)

Ease of implementation

Simplicity

Whitespace, indentation, and comment handling

Blank lines

Trailing horizontal spaces are ignored

Indentation characters (! as indent)

Disabling indentation processing with paired characters

Disabling indentation processing with an initial indent

Why are the indentations of block comments and datum comments significant?

Child lines producing an empty value are still child lines

End-of-line (EOL) handling

End-of-file (EOF) handling

Special semicolon values for an unsweetener

Other specific sweet-expression constructs

Singleton expression represents itself

The #!sweet directive

Grouping and splitting (\\)

Why does initial \\ mean nothing if there are datums afterwards on the same line?

Traditional abbreviations

Sublist ($)

Why is a $ b equivalent to (a b) rather than (a (b))?

Why specifically use $ for SUBLIST, and \\ for the two behaviors GROUP and SPLIT?

Collecting lists (<* ... *>)

Reserved marker ($$$)

Line Continuation

Comparisons to other notations

Comparison to M-expressions

Comparison to Honu

Comparison to Q2

Comparison to P4P

Comparison to Z

Comparison to Genyris

Comparison to the “Initial Arne formulation”

Comparison to “Whitespace to Lisp” (wisp)

Closing SUBLIST by unmatched dedent (“Beni Formulation of SUBLIST”)

Variation: Closing end-of-line SUBLIST by unmatched dedent (“Beni-Lite”)

Experience using and implementing sweet-expressions

Style guide

Use indentation for major program/data structure

Use infix notation

Use function call notation for parameters if they fit in a line

Avoid unnecessary parentheses

Indentation

Width

Why is `a $ b` equivalent to `(a b)` rather than `(a (b))`?

Why specifically use `$` for SUBLIST, and `\\` for the two behaviors GROUP and SPLIT?