Title

Curly-infix-expressions

Authors

David A. Wheeler

Alan Manuel K. Gloria

Status

This SRFI is currently in final status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-105@nospamsrfi.schemers.org. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.

Related SRFIs

None

Abstract

Lisp-based languages, like Scheme, are almost the only programming languages in modern use that do not support infix notation. In addition, most languages allow infix expressions to be combined with function call notation of the form f(x). This SRFI provides these capabilities, both for developers who already use Scheme and want these conveniences, and also for other developers who may choose to use other languages in part because they miss these conveniences. Scheme currently reserves {...} “for possible future extensions to the language”. We propose that {...} be used to support “curly-infix-expression” notation as a homoiconic infix abbreviation, as a modification of the Scheme reader. It is an abbreviation in much the same way that 'x is an abbreviation for (quote x).

A curly-infix list introduces a list whose visual presentation can be in infix order instead of prefix order. For example, {n > 5}(> n 5), and {a + b + c}(+ a b c). By intent, there is no precedence, but e.g., {x + {y * z}} maps cleanly to (+ x (* y z)). Forms with mixed infix operators and other complications have “$nfx$” prepended to enable later processing, e.g., {4 + 5 * 6}($nfx$ 4 + 5 * 6). Also, inside a curly-infix list (recursively), expressions of the form f(...) are simply an abbreviation for (f ...).

Note that this is derived from the “readable” project. We intend to later submit at least one additional SRFI that will build on top of this SRFI, but curly-infix-expressions are useful on their own.

Rationale

Lisp-based languages, like Scheme, are almost the only programming languages in modern use that do not support infix notation. Even some Lisp advocates, like Paul Graham, admit that they “don’t find prefix math expressions natural” (http://www.paulgraham.com/popular.html) even after decades of experience with Lisp-based languages. Paul Prescod has said, “I have more faith that you could convince the world to use Esperanto than prefix notation” (http://people.csail.mit.edu/gregs/ll1-discuss-archive-html/msg01571.html). Infix is not going away; standard mathematical notation uses infix, infix notation is taught to most people (programmers or not) in school, and nearly all new programming languages include infix.

Adding infix support to Scheme would be a useful convenience for some existing developers who use Scheme, and it would also eliminate a common complaint by developers who currently choose to use other languages instead.

Scheme currently reserves {...} “for possible future extensions to the language”. We propose that {...} be used to support “curly-infix-expression” notation as a reader abbreviation, just as 'x is an abbreviation for (quote x) and (x y z) is an abbreviation for (x . (y . (z . ()))).

This proposal is an extremely simple and straightforward technique for supporting infix notation. There is no complex precedence system, all other Scheme capabilities (including macros and quasiquoting) work unchanged, any symbol can be used as an infix operation where desired, and Scheme remains general and homoiconic. Curly-infix-expressions (also known as c-expressions) are just a convenient reader abbreviation for infix notation.

At its core, this SRFI provides the simple curly-infix list, a list whose visual presentation is in infix order instead of prefix order. The simple curly-infix list {operand-1 operator operand-2 operator operand-3 operator ...} is mapped to (operator operand-1 operand-2 operand-3 ...) so that two or more operands are handled cleanly. E.g., {a + b + c}(+ a b c).

More examples of c-expressions and their mappings are given below in the specification. See the design rationale for details on why the notation is designed the way it is.

Specification

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.

Curly-infix-expressions” (aka “c-expressions”) are s-expressions with an additional supported notation: The curly-infix list. A curly-infix list is syntactically almost identical to a regular list, but it is surrounded by a matched pair of braces instead of by a pair of parentheses, and instead of a sequence of s-expressions it contains a sequence of neoteric-expressions (which add support for formats like f(x)). Once a curly-infix list is read, it is mapped differently than a regular list by a curly-infix reader:

  1. A simple curly-infix list has an odd number of parameters, at least three parameters, and all even parameters are “equal?”. If there is more than one even parameter, and an even parameter contains a cycle, then the equal? comparison MUST terminate if equal? terminates (otherwise the comparison MAY terminate). A simple curly-infix list is mapped by the reader into a list with the first even parameter followed by the odd parameters. E.g., {n <= 5}(<= n 5), and {4 * 5 * 6}(* 4 5 6).
  2. The empty curly-infix list {} is mapped to the empty list (). An implementation MUST permit, and not require, whitespace between the braces in an empty curly-infix list.
  3. An escaping curly-infix list {e} is mapped to e. E.g., {5}5.
  4. A unary-operation curly-infix list {e1 e2} is mapped to (e1 e2). E.g., {- x}(- x).
  5. The mapping of a curly-infix list beginning with the symbol “.” is unspecified. (Note: the reference implementation maps {. e} to e.)
  6. Any other curly-infix list (including all other improper lists) is mixed. A mixed curly-infix list MUST be mapped to that list with “$nfx$” added to its front. E.g., {q + r * s}($nfx$ q + r * s), and {q + r . s}($nfx$ q + r . s).

Here is the syntax of a curly-infix list (which is nearly identical to a traditional list):

curly‑infix‑list → {” <whitespace>* [ <n-expression> [ <whitespace>+ <n-expression> ]* [ <whitespace>+  .  <whitespace>+ <n-expression> ] <whitespace>* ] “}

A “neoteric-expression” (aka “n-expression”) is a curly-infix-expression, with the following additional syntaxes and mappings for a datum (where e is any datum expression):

  1. e(...)(e ...). E.g., cos(x)(cos x), f(a b)(f a b), exit()(exit), and read(. options)(read . options).
  2. e{}(e) when there are zero or more whitespace characters within the braces; otherwise, e{...}(e {...}). E.g., f{n - 1}(f {n - 1})(f (- n 1)), and g{- x}(g (- x)).
  3. e[...]($bracket-apply$ e ...)
  4. The above mappings MUST NOT be applied if one or more whitespace characters are present between e and the open paired character.
  5. An unprefixed ( . e) MUST map to e.
  6. These MUST recurse within lists and vectors, so any list or vector in a position that accepts a neoteric expression MUST accept a sequence of zero or more neoteric expressions, not just s-expressions. (Note that this occurs if they are directly or indirectly within a curly-infix list or a neoteric-expression.) Any other implementation-specific constructs in a position that accepts a neoteric expression SHOULD accept neoteric-expressions within it.
  7. These MUST recurse left-to-right. E.g., f{n - 1}(x)(f {n - 1})(x)(f (- n 1))(x)((f (- n 1)) x)

Where datum comments are supported using #;, datum comments SHOULD comment the datum as defined above. (Datum comments are defined in SRFI-62; they are also included in R6RS and R7RS draft 6.) Note that any s-expression is also an n-expression, because n-expressions include c-expressions and c-expressions include s-expressions.

Here are some examples of c-expressions (note that all operators in curly-infix-expressions are delimited):

  1. {n <= 5}(<= n 5)
  2. {x + 1}(+ x 1)
  3. {a + b + c}(+ a b c)
  4. {x ,op y ,op z}(,op x y z)
  5. {x eqv? `a}(eqv? x `a)
  6. {'a eq? b}(eq? 'a b)
  7. {n-1 + n-2}(+ n-1 n-2)
  8. {a * {b + c}}(* a (+ b c))
  9. {a + {b - c}}(+ a (- b c))
  10. {{a + b} - c}(- (+ a b) c)
  11. {{a > 0} and {b >= 1}}(and (> a 0) (>= b 1))
  12. {}()
  13. {5}5
  14. {- x}(- x)
  15. {length(x) >= 6}(>= (length x) 6)
  16. {f(x) + g(y) + h(z)}(+ (f x) (g y) (h z))
  17. {(f a b) + (g h)}(+ (f a b) (g h))
  18. {f(a b) + g(h)}(+ (f a b) (g h)) as well
  19. '{a + f(b) + x}'(+ a (f b) x)
  20. {(- a) / b}(/ (- a) b)
  21. {-(a) / b}(/ (- a) b) as well
  22. {cos(q)}(cos q)
  23. {e{}}(e)
  24. {pi()}(pi)
  25. {'f(x)}'(f x)
  26. {#1=f(#1#)}#1=(f #1#) if there is support for the SRFI-38 external representation for data with shared structure
  27. { (f (g h(x))) }(f (g (h x))) ... note that this is not (f (g h (x)))
  28. {#(1 2 f(a) 4)}#(1 2 (f a) 4)
  29. {(f #;g(x) h(x))}(f (h x)) if datum comments are supported... note that this is not (f (x) (h x))
  30. {(map - ns)}(map - ns)
  31. {map(- ns)}(map - ns) as well
  32. {n * factorial{n - 1}}(* n (factorial (- n 1)))
  33. {2 * sin{- x}}(* 2 (sin (- x)))
  34. {3 + 4 +}($nfx$ 3 + 4 +)
  35. {3 + 4 + 5 +}($nfx$ 3 + 4 + 5 +)
  36. {a . z}($nfx$ a . z)
  37. {a + b - c}($nfx$ a + b - c)
  38. {read(. options)}(read . options)
  39. {a(x)(y)}((a x) y)
  40. {x[a]}($bracket-apply$ x a)
  41. {y[a b]}($bracket-apply$ y a b)
  42. {f{n - 1}(x)}((f (- n 1)) x)
  43. {f{n - 1}{y - 1}}((f (- n 1)) (- y 1))
  44. {f{- x}[y]}($bracket-apply$ (f (- x)) y)

A curly-infix reader is a datum reader that can correctly read and map curly-infix-expressions. A curly-infix reader MUST include the braces “{” and “}” as delimiters.

An implementation of this SRFI MUST accept the marker #!curly-infix followed by a whitespace character in its standard datum readers (e.g., read and, if applicable, the default implementation REPL). This marker (including the trailing whitespace character) MUST be consumed and considered whitespace. After reading this marker, the reader MUST accept curly-infix-expressions in subsequent datums read from the same port until some other conflicting marker is given (no conflicting marker is specified here).

Implementations of this SRFI SHOULD implement curly-infix-expressions in their datum readers by default, even when the marker is not received. Portable applications SHOULD include this marker before using curly-infix-expressions, typically near the top of a file. Portable applications SHOULD NOT use this marker as the very first characters of a file (e.g., it could be preceded by a newline), because they might be misinterpreted on some platforms as an executable script header.

An implementation MUST NOT bind the symbols “$nfx$” or “$bracket-apply$” by default to a procedure, macro, or syntax that cannot be overridden. An implementation SHOULD NOT bind the symbols “$nfx$” or “$bracket-apply$” to a procedure, macro, or syntax in the default environment, with the exception that it MAY bind them by default to something that produces an error. These two symbols are reserved for use by library writers (in the case of a library-based implementation of this SRFI, these symbols are reserved for use by other libraries) and application writers.

However, an implementation MAY provide one or more libraries that when imported bind the “$nfx$” and/or “$bracket-apply$” symbols (as it is then a library, this case actually falls under the “reserved for use by library writers” clause above). Application writers and other library writers using that implementation are then free to use or not use the implementation’s provided “$nfx$” and/or “$bracket-apply$” as provided by those libraries.

Implementations MAY provide the procedure curly-infix-read as a curly-infix reader. If provided, this procedure SHOULD support an optional port parameter.

Security implication: If the implementation does not check for circularity when doing equality comparisons, and a supplier of malicious data can specify a circularity, the reader could fail to terminate when comparing infix operators. In the worst case this could cause a denial of service. A solution is to check for circularity when comparing operators.

Note that, by definition, this SRFI modifies lexical syntax.

Design Rationale

This SRFI design rationale is unusually long, especially when you compare it to the simplicity of its specification. However, the notation described in this SRFI builds on the lessons learned from the many previous infix mechanisms that have been developed for Scheme and related Lisp-based languages. The authors believe that it is important to document why various decisions were made, in particular, to show why this approach is an improvement over past approaches and more likely to gain wide acceptance. We have separated the design rationale from the overall rationale, as was previously done by SRFI-26, because it is easier to understand the design rationale after reading the specification.

Why not macros? Why modify the reader?

Many previous systems have implemented “infix” systems as a named macro or procedure (e.g., INFIX). This looks ugly, and it does the wrong thing — the resulting list always has INFIX at the beginning, not the actual infix operator, so this approach can interfere with quoting, macros, and other capabilities. In particular, consider the following syntax-rules macro for function composition:

(define-syntax o
  (syntax-rules ()
    ({f o g}
     (lambda args
       (f (apply g args))))
    ({f o g o h o ...}
     {(lambda (x)
        (f (g x))) o h o ...})))

This example takes advantage of the fact that {f o g o h o ...}(o f g h ...). Infix cannot be implemented as a macro alone, as the syntax-rules form has a particular treatment for the pattern. A macro for infix would very likely confuse the syntax-rules form.

A reader notation that maps to a simple and obvious s-expression structure also allows notations such as (map . {as + bs})(map + as bs). For example, in combination with SRFI-26, you can express templated procedures: (cut . {<> < 42})

Why use brace characters for infix?

There is no perfect character, but braces (aka curly braces) are pretty close. A key issue is that you want a balanced pair of characters to identify infix, since you can have infix-in-infix. The curly braces are visually pleasant pairs, so it makes sense to use these precious characters on something extremely common: infix notation.

All other character pairs other than braces have serious problems. Parentheses are already spoken for, of course. R6RS Scheme already uses up square brackets as a synonym for parentheses. Angle brackets are already used for comparison. Paired characters outside the ASCII set have other problems: some Schemes do not support characters outside the ASCII character set, such characters are not as well supported by other tools (and are sometimes corrupted by such tools), they are more complicated to deal with due to character encoding problems, and they are harder to enter on many keyboards.

In contrast, curly braces are in the ASCII character set and are already available for this purpose. They do not have a standardized meaning in any Scheme specification. They are also widely available in many other Lisp-derived languages, such as Common Lisp (as we’d like this notation to be widely useful across Lisps, even beyond Scheme).

Although curly braces can be used as local Scheme extensions, there are few Scheme implementations which do so. On September 5, 2012, John Cowan posted the results analyzing the meanings of square brackets and curly braces in his Scheme test suite (of 45 Scheme implementations). Only 2 (Chibi and RScheme) of 45 currently do something special with braces; “the other implementations treat them as either synonyms for parentheses, lexical syntax errors, or identifier characters”. That is a remarkably small number of Scheme implementations where this use of curly braces would conflict with some special semantic. Donovan Kolbly reported that “RScheme uses braces to delimit C code embedded in Scheme code... that said, a scanner hack could easily mode-switch to SRFI-105 interpretation where needed.” The Chibi extension using braces can be added or removed through a compile-time option, so not even all Chibi executables have a conflicting use of brace characters.

It’s true that {...} are often used in math for set notation. But infix notation is far more basic, and common, than sets. Also, traditional function call notation and infix are helpful when working with sets, so infix notation is the more important need. Once you allow neoteric-expressions, the notation set(...) is a reasonable alternative.

Why not use a completely different notation inside the expression?

Some past systems have built infix notations into the reader in which the infix notation was radically different from normal Lisp notation. For example, the symbol for procedure calls might change, the names of variables or procedures might be spelled differently (at least in some cases), and so on. The result, in some cases, would be that these notations would simultaneously lose Lisp’s abilities for quoting, quasiquoting, and so on, and these notations were not homoiconic. It may become impossible to refer to certain symbols, since their names might include a character that is interpreted as an infix operator. They can also be confusing; the same symbols (e.g., parentheses) would have a completely different meaning inside and outside the parentheses.

In contrast, this curly-infix-expression proposal avoids these problems. The syntax for list creation, quasiquotation, and so on is almost identical in a curly-infix-expression when compared to traditional notation. For example, in curly-infix-expressions, `{,a + ,b} maps cleanly to `(+ ,a ,b), which works as expected with all macros. The main difference is that, in a curly-infix-expression, the position of the operator in its surface syntax may be in a different location (infix) than its actual final location.

Why not autodetect infix?

Some past efforts tried to automatically detect infix operators, but this turns out to not work well. It’s hard to express good rules for detecting infix operators, and the rules become too complex for users (e.g., “punctuation-only symbols” doesn’t detect “and” or “or”). And in any case, if they were automatically detected, an escape mechanism would be needed anyway - consider (map - ns) for getting a new list with the numbers in ns negated. Allowing the user to expressly notate when infix was intended, using {...}, turns out to be clearer and more intuitive. In particular, curly-infix-expressions allow the use of infix with any symbol, whenever you want... and where it’s not convenient, you don’t need to use it. It is also very backwards-compatible: Normal lists work normally, and if you want infix, use {...}.

Why use equal? to compare operators in a “simple” curly-infix list for equality?

Operators are compared using equal? so that constructs like ,op are legal operators, e.g., {x ,op y ,op z}. Note that unfortunately if the operator construct contains a cycle, it might not terminate if equal? does not terminate in the presence of cycles. This was specified this way so that implementers could use the normal Scheme equal? comparison instead of having to implement a special comparison operator just for this particular case.

Why must infix operators be delimited?

Curly-infix lists require that the infix operators be delimited (e.g., by spaces). This is consistent with Lisp history and current practice. Currently, in Lisp, operators are always delimited in traditional s-expressions (typically by left parentheses on the left, and by whitespace on the right). It’s impractical to do otherwise today; most Lisps, including Scheme, allow and predefine symbols that include characters (like “-”) that are typically used for infix operators. If infix operators were not delimited, it would be impractical or complicated to refer to standard Scheme identifiers. By requiring delimiters (as is already true for the rest of Scheme), any procedure may be used as an infix operator, not just a fixed list. Many developers put space around infix operators even in languages that don’t require them, so syntactically requiring them is no burden. There are even other existing languages that also require infix operators be delimited, such as SNOBOL4, PLOT, and REBOL. In short, it is difficult to allow infix operators without delimiters, and the visual results are the same as many real-world uses in other languages, so the result appears quite customary to typical software developers.

Why isn’t precedence part of this SRFI?

Many past “infix” systems for Lisp build in precedence. However, Lisp systems often process other languages, and they may freely mix these different languages. Thus, the same symbol may have different meanings and precedence levels in different contexts. The symbol might not even be defined where it is being used, and allowing precedence definitions would create subtle errors if files are read in a different order. If users hook in their own precedence system into a reader, it could even become difficult to combine code written for different precedence systems. In short, building precedence into a Lisp reader creates many complexities.

Yet the complexity of precedence systems is often unnecessary. In practice, we’ve found that simple infix is all that’s needed most of the time in Lisp-based languages. Even in other languages, many developers unnecessarily use grouping symbols with infix operators to make their order clear. An examination of two Scheme programs written using curly-infix-expressions (posted 2012-09-14) found that 55/78 (71%) of the top-level c-expressions do not embed an opening brace; this means that precedence is irrelevant for more than two-thirds of these top-level c-expressions. Thus, requiring grouping symbols is less of a hardship than it might appear.

In addition, there is some experimental evidence that developers (1) rarely use precedence rules and (2) often apply them incorrectly. The paper “Developer beliefs about binary operator precedence (part 1)” by Derek M. Jones documents analysis of developer use and understanding of precedence rules:

By intentionally not building a precedence system into the reader, a very simple yet useful infix system results. We don’t need to register procedures, ensure that declarations of precedence precede their use, gain widespread agreement on some precedence order, or anything like it. We also ensure that the notation is clearly homoiconic.

Instead, where precedence is desired, application and library writers can implement precedence by defining and controlling the scope of an “$nfx$” macro or procedure, or by later postprocessing of that symbol. Scheme macros are already quite powerful and capable of handling this; in these cases, {...} provides a more convenient notation. The curly-infix-expression approach, instead of trying to manage both infix and precedence, handles simple cases and then takes advantage of the existing Scheme scoping rules and macro system for more complex cases (in the rare cases where they are needed).

Note that curly-infix-expressions include support for unary operators, but again, they are without precedence. As a result, they must be grouped separately. This does not lead to hard-to-read expressions, however. Examples of simple curly-infix lists combining infix and unary operations include {-(x) * -(y)} and {-{x} * -{y}} (the notation is designed so that both work).

At first David A. Wheeler, who started this project, considered reporting an error if a simple infix expression isn’t provided. However, prepending “$nfx$” is much more flexible.

Could precedence be added?

It would be possible to extend curly-infix-expressions to provide a fixed precedence system (e.g., if an expression is mixed, attempt to use various precedence rules). Here is a discussion how this could be accomplished in the future (should that be necessary), which may also show why such systems were not proposed in this SRFI. It is important to understand that such capabilities would be extensions beyond this SRFI.

It would be best if the precedence rules (if any) were absolutely fixed; otherwise, subtle bugs would happen if only some files were read after the precedence was declared, and code would be hard to correctly combine and move if different code sections used different precedence rules. However, the precedence rules can be fixed while still allowing arbitrary new symbols, as shown below.

Unfortunately, there would be substantial arguments about the semantics (including the operators and precedence levels) of any precedence system, making it difficult to gain widespread implementation of a single precedence system. For example, should there be support for combining different ranged comparisons, to support notations such as {a < b <= c}? Should unranged comparisons (e.g., =) have a different precedence than ranged comparisons (e.g., >=)? Should some operators be right-associative, and if so, which ones? (Exponentiation and assignment are often right-associative.) Some symbols have different semantics in different contexts (e.g., = may mean equal-to or assignment in different contexts), and this complicates setting precedence levels of some operators. Although many languages (including C and Java) give multiplication and division the same left-associative precedence, even that is not universal; multiplication is considered higher precedence than division in the manuscript submission instructions for the Physical Review journals, the Course of Theoretical Physics by Landau and Lifshitz, and the Feynman Lectures on Physics. In addition, Wolfram Alpha considers implied (but not explicit) multiplication higher precedence than division. In The Development of the C Language, Dennis M. Ritchie notes that some of the precedence rules of C were infelicitous; should those problematic rules be used or not? There would also be substantial disagreement on exactly what operators should be in the precedence table (including which combinations and if Unicode characters should be included), their order, and whether the table should be “big” or “small” (since it is fixed, there are arguments for a larger one, but a larger one is harder to remember).

Here is an example of a precedence system that could extend curly-infix-expressions, called here the “math” extension. In this extension, if a mixed curly-infix list is seen, it first attempts to apply the “math” ruleset, and only prepends “$nfx$” if it does not meet the requirements. In this extension:

  1. All even-numbered parameters must be symbols, there must be an odd number of parameters, and there must be at least five parameters (the minimum to have more than one operator).
  2. A fixed set of operators is supported, in a fixed order of precedence; the operators are compared to the table below to determine precedence.
  3. If an operator is not in the fixed table, but it is a symbol, a new symbol is created by removing from the symbol all “-” surrounded by alphanumerics, removing all alphanumerics, and then removing all characters not in an operator of the precedence table; that new symbol is then compared to the table. This allows the use of many more operators, e.g., “char-ci<=?” would be considered an operator with the same precedence as “<=”. If there’s still no match for any operator, the expression does not meet the “math” ruleset, and the normal mixed rules are used ($nfx$ is inserted at the front).
  4. When the same (unmodified) operator is repeated, the operator simply gains another operand, so {a + b + c * d} is the “+” operator applied to three operands: a, b, and (* c d).
  5. Other different operators of the same precedence are interpreted in left-to-right order, so {a + b - c} maps to {{a + b} - c}.

Below is one possible precedence table for the “math” extension, in high-to-low precedence order (where ... matches 0 or more characters), presuming all is left-to-right (the point in part is to show that it would be challenging to get agreement on such a list):

  1. Subscript/down-arrow: sub {Unicode: ↓, ⇓}
  2. Exponentiation/superscript/up-arrow: exp..., **, ^, sup {Unicode: ↑, ⇑}
  3. Multiplication/division: *, /, div..., mod..., quo... {Unicode: ÷, ×}
  4. Addition/subtraction: +, -
  5. Bitwise and: bit...and, log...and, &
  6. Bitwise or/xor: bit...or, log...or, |
  7. Comparison:
  8. Logical conjunction: and {Unicode: ∩, ∧}
  9. Logical/exclusive disjunction: or, xor, eor {Unicode: ∪, ∨, ⊕}
  10. Implication/right arrows/doubled arrows: ->, =>, <->, <=>, -->, ==>, <-->, <==> {Unicode: ↔, ⇔, →, ⇒}
  11. Definition/assignment/left arrow: <-, <--, <==, :=, ::= {Unicode: ≡, ←, ⇐}

Think that is too complicated? Many other variations are possible. A “simple math” ruleset could be devised instead, e.g., perhaps it has a shorter list of built-in operators. The extant code using curly-infix-expressions tends to combine these: *, /; +, -; < , <= , >=, >, =, <>, eq...; and; and or; adding exponentiation (e.g., ** and exp...) at a higher precedence level, and implication (e.g., -> and =>) would make sense for a short list. (The operator <> isn’t in many Scheme specifications, but it would be odd to omit it from a precedence list.)

The key point is that although this SRFI does not include a precedence system, one (such as these) could be added later (using a different marker). If a precedence system were added, all existing code using curly-infix lists other than mixed curly-infix lists would work unchanged. Even when it’s not, many “$nfx$” processors would likely generate the same order in most actual cases, making transition easy should this ever occur. Since it would be difficult to gain such agreement, and the value of such a system is doubtful, it is better to provide a much simpler system that does not include precedence. Again, any support for precedence is an extension beyond this SRFI.

Why is $nfx$ not predefined?

Implementations should not predefine a meaning for $nfx$, other than possibly to something that always produces an error (e.g., raises an exception).

If anyone wrote code that depended on some local implementation of $nfx$, then by definition it would become implementation-dependent. Yet the point of the “$nfx$” macro is to allow application authors the ability to control what to do in that case, not to make them unwittingly dependent on an implementation.

Of course, an implementation could provide a pre-canned macro that could be used as a definition of $nfx$. But in that case, importing the library would be an explicit act, easily seen in the code, instead of being hidden. That way, it is easy to determine when an implementation-dependent capability is being used.

Why are 0, 1, and 2 parameters special?

The empty curly-infix list {} is intentionally mapped to (), as it is an empty list, and this is the likely user meaning (reducing unnecessary errors).

The one and two parameter cases are defined in part to reduce user error, and in part to provide better support:

  1. An “escaping” {e} is mapped to e so that {...} can be used for grouping: {{{a} + {b}}} is equivalent to (+ a b). It ensures that the neoteric-expression f{x} becomes the likely-intended (f x). It makes it easy to use prefix notation; e.g., { f(x) } is another way to write (f x). Finally, it also provides an easy escape mechanism in sweet-expressions for symbols that would otherwise have other meanings.
  2. The “unary-operation” curly-infix list {e f}(e f), so that {- x}(- x), the likely interpretation, and also so that the neoteric-expressions like f{- x}(f (- x)) are interpreted properly.

Note that prefix notation can be enabled throughout a define statement by surrounding it with curly braces, since enclosing a single datum (such as a single list) simply passes it through:

  {(define my_cadr(x)
    car(cdr(x)))}
  {(define my_abs(x)
    (if {x >= 0}
      x
      -(x)))}

Why a marker starting with #! and a letter?

We would like implementations to always have curly-infix-expression parsing enabled. However, some implementations may have other extensions that use {...}. We want a simple, standard way to identify code that uses curly-infix-expressions so that readers will switch to supporting curly-infix-expressions if they need to switch. This is especially important for those few Scheme implementations that already use braces for some other purpose.

The #! marker prefix was suggested due to its similarity to other markers. After all, R6RS and R7RS (draft 6) already use #!fold-case and #!no-fold-case as special markers to control the reader. Using another marker beginning with #! and a letter allows for a simple, similar-looking marker for a similar situation. What’s more, it implies a reasonable convention for reader extensions: markers that begin with #!, followed by an ASCII letter, should have the rest read as an identifier (up to a whitespace) and use that to control the reader.

The marker is read up to a whitespace; this is consistent with Scheme R7RS ticket #447. This makes easy to distinguish between markers if one marker begins with the same characters as another entire marker. The marker semantics are assigned to the port (not to the end-of-file), again, to make it more consistent with other markers.

This marker need not interfere with other uses of #!. SRFI-22 supports #! followed by space as a comment to the end of the line; this is supported by several implementations, but this is easily distinguished from this marker by the space. Guile, clisp, and several other Lisps support #!...!# as a multi-line comment, enabling scripts with mixed languages and multi-line arguments. But in practice the #! is almost always followed immediately by / or ., and other scripts could be trivially fixed to make that so. R6RS had a non-normative recommendation to ignore a line that began with “#!/usr/bin/env” (without a space), as well as “#! /usr/bin/env” (with a space before the slash), but this is non-normative; an implementation could easily implement #! followed by space as an ignored line, and treat #! followed by / or . differently. Thus, implementations could trivially support (simultaneously) markers beginning with #! followed by a letter (such as the one to identify support for curly-infix-expressions), the SRFI-22 #!+space marker as an ignored line, and the format #!/ ... !# and #!. ... !# as a multi-line comment. Note that this SRFI does not mandate support or any particular semantics for #!fold-case, #!no-fold-case, the SRFI-22 #!+space convention, or #! followed by a slash or period; it is merely designed so that implementations could implement them all simultaneously.

We do not require that applications include this marker. Our hope is that over time everyone will just support this natively, making the marker unnecessary, and we do not want to require that users include an unneeded marker.

We recommend that #!curly-infix not be the very first characters in a file (e.g., put a newline in front of it). If the file began with #!curly-infix, is made executable, and then execution is attempted, this might confuse some systems into trying to run the program curly-infix.

By intent, this SRFI (including the enabling mechanism) doesn’t use or interact with any module system at all (including the R6RS and R7RS module systems). This is because some implementations won’t have a module system (or at least not a standard one). Curly-infix-expressions are an intentionally simple mechanism that can be built into even trivial Scheme implementations. Mandating module support is unnecessary and might inhibit its adoption.

Why the marker #!curly-infix?

There were two competing alternatives: #!srfi-105 and #!curly-infix.

The #!srfi-105 marker was recommended during discussion of SRFI-105, in part because it makes it clear where more information can be gathered. Also, it suggests that srfi- should be the namespace for SRFIs, a plausible convention.

However, the #!curly-infix marker has the advantage of being more obvious about what is being enabled. Modern search engines make it easy to find where information is, using either naming convention. Finally, if this capability were accepted into future Scheme specification, its name would not need to change.

What about the Racket “infix convention”?

Racket allows a notation called the “infix convention” with the form “(a . operation . b). An advantage of this alternative is that it does not use the braces, so it might be easier to implement in Schemes which already define {...} in a local extension. However, the Racket “infix convention” has many problems:

In short, cases where infix notation would be useful are extremely common, so its notation should be convenient. The Racket “infix convention” may be the next-best notation for infix notation after curly-infix-expressions, but it’s next-best, and we should strive for the best available notation for such a common need. Curly-infix-expressions do not conflict with the Racket infix convention; implementations could implement both. We recommend that an implementation that implements the Racket infix convention should also allow multiple operands and use curly-infix-expression semantics for them, pretending that . op . is a single parameter. In that case, (a . + . b . + . c) would map to (+ a b c), and (a . + . b . * . c) would map to ($nfx$ a + b * c). Note that the existence of the Racket “infix convention” is additional evidence of the need for a standard infix convention; many have separately created mechanisms to try to provide infix support.

What about the Racket infix.plt package?

The “Infix Expressions for PLT Scheme” package by Jens Axel Søgaard, called here the “Racket infix.plt” package, provides an infix notation for PLT Scheme (now called Racket). The manual for the Racket infix.plt package lists a number of examples and constructs.

Jens Axel Søgaard stated on the SRFI-105 mailing list on 2012-10-30 that infix.plt was not meant to be “the final say in infix notation” and that its implementation on Planet was “just meant to be something to try out. In fact I have changed some of the decisions in Bracket”. Still, infix.plt is widely available, and it is a reasonable representative example of an infix system that has been developed for Scheme. Jens Axel Søgaard explained that its original design rationale was as follows:

  1. The operations + - * / ^ [are] written using their standard syntax
  2. No other operations get special syntax
  3. function application is name[] (same choice as Mathematica - but it could easily be name() instead)
  4. Since - is used in Scheme names, one can use _ to write names using -
  5. Other names can be written using | | syntax.

He added that, “The implementation has a few more bells and whistles mostly stolen from: http://reference.wolfram.com/mathematica/guide/Syntax.html. The rationale is to keep the infix operations to the essentials in order not to interfere too much with builtin names of Scheme. Since - is so common in the variable names, a special syntax _ is needed. Other names can be used by the standard | | syntax for peculiar variable names.” He noted that he hadn’t “mentioned the syntax chosen to delimit the infix expressions. That’s on purpose, since I [haven’t] decided what I like best yet.” In the sample implementation, the 3-character sequence “@${” begins an infix expression, which switches to a completely different language that ends with a matching “}”.

Given this rationale, we can examine its documentation and source code to see its specifics. No single location gives a concise and complete definition of the infix.plt grammar, but the parser.ss source code includes a partial grammar. This partial grammar is:

<e> :== <num>
     |  <id>                   variable reference
     |  <e> [ <args> ]         application (like Mathematica)
     |  { <args> }             list construction
     |  <e> + <e>              addition
     |  <e> - <e>              subtraction
     |  <e> * <e>              multiplication
     |  <e> / <e>              division
     |  <e> ^ <e>              exponentiation
     |  - <e>                  negation
     | ( <e> )                 grouping

<id>   An identifier begins with a letter,
       and is optionally followed by series of letters, digits or underscores.
       An underscore is converted to a -. Thus list_ref will refer to list-ref.

<num>  A number is an non-empty series of digits,
       optionally followed by a period followed by a series of digits.

This grammar is incomplete; infix.plt also supports (per its documentation or source code):

  <e> OP <e>  comparison. OP is <, <=, =, <>, >,=, > (and Unicode)
  <e> := <e>  assignment
  <e> ; <e>   sequence
  (λ ids . expr)     anonymous function
  √<e>         square root (Unicode character)
  ¬<e>         logical not

Differences in this approach, which some may see as advantages, are that infix operators need not be separated by whitespace, it provides precedence of * and / over + and -, and it builds in a simple assignment statement if you want it. Like many infix notations (including curly-infix-expressions), for many expressions the infix.plt package is a far clearer notation than the traditional s-expression.

However, there are some negatives to the infix.plt approach:

  1. Because it has a fixed built-in grammar, only a few fixed symbols can be used as infix operators. For example, “and” and “or” are not supported as infix operators, yet they are commonly used in infix position in other languages. Similarly, it does not support the many other procedures that might be useful in infix position, such as char-ci>? and eq?. This demonstrates a basic problem with only allowing a fixed set of infix operators in a notation: Scheme (like all Lisps) allows for easy creation of new procedures and macros. The SRFI-105 authors believe an infix notation should make it easy to use any procedures and macros in whatever position is most natural... including procedures and macros that have not yet been defined.
  2. Because it has a precedence system, it is necessarily less homoiconic when precedence is used. Note that curly-infix-expressions support precedence via $nfx$, and a built-in precedence system could be added later if this was desired by the community.
  3. There seems to be no provision for Scheme capabilities such as quoting and quasiquoting; a quasiquoted variable does not seem to be allowed.
  4. Many variables and operators are more difficult to refer to and/or must be spelled differently (and thus inconsistently). Its documentation states that identifiers (which are also used for function names) must “begin with a letter, and is optionally followed by series of letters, digits or underscores. An underscore is converted to a -”. Under these rules, it is not possible to call procedures with names like “char=?” or use variables with “*” embedded in them. It does allow references to variable names with “-” embedded in them, but in this approach names must be spelled differently (and thus inconsistently) by replacing every “-” with “_”. Thus, variables like “list-ref” must be spelled as “list_ref” inside the infix.plt notation as documented. The infix.plt documentation did not, at the time of this writing, document any way around this limitation. However, on 2012-10-21, Jens Axel Søgaard reported that other identifiers can be referred to using the |...| syntax. This works around the problem, but is slightly more cumbersome when it is necessary, and is inconsistent with other code where |...| is not required. These are fundamental side-effects of not requiring infix operators to be delimited (e.g., by whitespace). Since some symbols must be escaped with |...| inside infix.plt but not outside, and some symbols require replacing every “-” with “_” where this is not done, some symbols are represented inconsistently... and this inconsistency can lead to potentially hard-to-find errors.
  5. The notation is completely different and inconsistent with the surrounding Lisp notation. The curly braces {...} are suddenly used for the list creation operation instead of parentheses, square brackets x[...] are used for function application instead of parentheses, and the parentheses (...) are instead used for grouping (and not for list creation or function application). The same punctuation mark can have a completely different meaning in different contexts, leading to a potential for confusion, and this confusion could easily lead to hard-to-find errors.

Note that an implementation could support both curly-infix-expressions and the Racket infix.plt package simultaneously.

Like the Racket “infix convention”, the infix.plt package does demonstrate that there is a need for an infix notation that can be used in Scheme.

What about Gambit’s “Scheme Infix eXtension (SIX)”?

The Gambit reader includes a notation called the “Scheme Infix eXtension (SIX)” that supports infix notation. SIX expressions begin with a backslash.

Like curly-infix-expressions, SIX is a reader extension. But SIX has a number of problems compared to curly-infix-expressions:

Like the Racket “infix convention” and Racket infix.plt package, SIX does demonstrate that there is a need for an infix notation that can be used in Scheme.

A system could simultaneously implement curly-infix-expressions and SIX. However, curly-infix-expressions are far simpler, are more flexible (e.g., by allowing arbitrary symbols), and work much more easily with macros and quoting. Thus, we believe that curly-infix-expressions are the better system and more appropriate for standardization across Scheme implementations.

What about Guile 1.4’s “reading infix” module?

Guile 1.4.x at gnuvola.org is self-described as a “(somewhat amicable) fork from the official Guile”. It includes support for reading infix expressions. Once activated, infix expressions are surrounded by #[ and ]. Infix operators are surrounded by whitespace. It supports precedence, which sounds like an advantage, but operators must be registered before use (and few are predefined), creating an opportunity for terrible errors if the expression is read first. There is also the opportunity for serious problems if different programs are written assuming different precedence levels. Inside the infix notation a very different language is used (e.g., parentheses are used for grouping instead of necessarily creating lists, and parameters are separated by commas), so it is unclear how well it would work with other Scheme features such as quasiquotation.

The guile 1.4 reading infix module has a more complex grammar requiring a more complex implementation and understanding. Its registration system creates serious problems when trying to use it for larger systems. This infix notation has not been accepted into the version of guile used by most people, so it is not even portable among most guile users. But perhaps the biggest problem is that this notation is fundamentally not homoiconic; it is harder to determine where lists begin and end with it.

Like Racket and SIX, this module does demonstrate that there is a need for an infix notation that can be used in Scheme.

In contrast, curly-infix-expressions are simpler, requires no registration system or other complexities, works more clearly with macros and quasiquotation, and has the general advantage of being homoiconic.

Why neoteric-expressions?

Lisp’s standard notation is different from “normal” notation in that the parentheses precede the function name, rather than follow it. Others have commented that it’d be valuable to be able to say name(x) instead of (name x):

Neoteric-expressions allow users to use a more traditional-looking notation for function calls. Quoting rules and macros continue to work as usual. In addition:

The (. e) rule handles expressions like read(. port), ensuring that they map to (read . port). If (. x) didn’t mean x, then it would be easy to get this case wrong. Also, if someone wanted to build on top of an existing reader, they would have to reimplement parts of the list-processing system if this wasn’t handled. It is already true that (. x) is x in guile, so there was already a working example that this is a reasonable extension. In fact, in a typical implementation of a list reader, it takes extra effort to prevent this extension, so this is a relatively easy extension to include.

Neoteric-expressions could be useful outside of curly-infix-expressions, and the sweet-expression notation (not defined here) builds on neoteric-expressions. However, accepting neoteric-expressions outside of any braces could change the interpretation of some existing Scheme code. Such code would arguably be badly formatted, and the code could be quickly fixed by a pretty-printer, but nevertheless changing already-standard syntax is clearly a larger change. It is anticipated that some users will want an infix notation and more traditional function call notation without changing the meaning of portable Scheme code. This SRFI is designed to meet the desires of those users. Other specifications could build on this SRFI without this limitation, e.g., other specifications could require support for neoteric-expressions outside all braces.

Comma-separated parameters

It would be possible to define neoteric-expressions to have comma-separated values in a function call; this would make it even more similar to traditional function call notation. A simple way would be to simply remove all commas, but this would interfere with ,-lifting, and thus was immediately rejected.

A better rule, that would indeed work, would be to require each parameter to end with a comma, and then remove that ending comma. However, this rule:

Many other languages do use commas, but they are required because an expression that uses infix order need not be surrounded by any marker. Since an infix expression must be surrounded by {...} in our notation, there is no need for additional commas for parameter separation.

Experimentation found that separating parameters solely by whitespace worked well, so that approach was selected.

Other comments on neoteric-expressions

Originally the prefix had to be a symbol or list. The theory was that by ignoring others, the reader would be backwards-compatible with some badly-formatted code, and some errors might not result in incorrectly-interpreted expressions. But this was an odd limitation, and in some cases other prefixes made sense (e.g., for strings). This was changed to eliminate the inconsistency.

The symbol $bracket-apply$ was once bracketaccess, but it turns out that the Kawa Scheme implementation already used $bracket-apply$. Originally $nfx$ was nfx, as this was used by some predefined macros for infix notation; it was changed slightly so that it would be unlikely to interfere with any pre-existing nfx procedure or macro, but would still be similar to its previous name. The symbols $bracket-apply$ and $nfx$ are somewhat more awkward to type directly, but this is actually a good thing; this means it is even more unlikely to be used unintentionally by user code.

This SRFI is intentionally silent on the interpretation of unprefixed square brackets, because different Schemes (as well as other Lisps) interpret square brackets differently. One survey of Scheme implementations with brackets and braces shows these differences; several Scheme implementations follow the R6RS specification that accepts [...] as a synonym for (...), GNU Kawa interprets [...] as the redefinable constructor ($bracket-list$ ...), and two implementations (Rep and FemtoLisp) use them as vector constructors. By intentionally not defining the interpretation of unprefixed square brackets, implementations are free to continue to use whatever interpretation their users are used to, and users can easily access that interpretation.

Neoteric-expressions used to be called “modern-expressions”. But some people didn’t like that name, and the obvious abbreviation (“m-expression”) was easily confused with the original Lisp M-expression language. So the name was changed to neoteric, which has a similar meaning and abbreviates nicely. It wasn’t called “function-expressions” because “f-expressions” are previously used (and can sound bad if said quickly), and they weren’t called “prefix-expressions” because “p-expressions” sound like “pee-expressions”. It’s not called “name-prefix” because the prefix need not be a name. There is absolutely no truth to the rumor that the notation was developed by a secret technologically advanced species, so pay no attention to “Microcosmic God” by Theodore Sturgeon :-).

The neoteric rules do introduce the risk of someone inserting a space between the function name and the opening character (e.g., an open parenthesis). But whitespace is already significant as a parameter separator; since this is how Scheme already works, this represents no change at all.

Obviously, this is trivial to parse. No power is lost, because this is completely optional; developers can use it when they want to, and they can use traditional s-expression notation if they want to. It’s trivially quoted... if you quote a symbol followed by “(”, just keep going until its matching “)”, which is essentially the same rule as before.

Other details

There is no requirement that writers (e.g., “write” or a pretty-printer) write out curly-infix-expressions. They may choose to do so, e.g., for lists of length 3-6 whose car is the symbol “and”, the symbol “or”, or a punctuation-only symbol. However, it would probably be wise to wait until many implementations can handle c-expressions.

The $nfx$ and $bracket-apply$ symbols are unhygienic, in the sense that programs that use them may begin by defining them even though these identifiers do not appear literally in the code. However, we see no way around this without losing the benefits that these (optional) features are meant to provide. As noted in a guile-devel post by Mark H. Weaver on 2012-10-26, “apart from the fact that $nfx$ etc. are meant to be defined by the user, it is exactly the same situation as for ‘quote’, ‘quasiquote’, ‘unquote’, ‘unquote-splicing’, ‘quasisyntax’, etc. The whole point of these shorthand notations is to avoid having to type the associated identifier, and yet this means that an identifier is being referenced without appearing literally in the code. These shorthand notations always involve a tradeoff. It means that the syntax is not quite as simple as the original s-expressions (as printed by ‘write’), and the user has to know a few more rules for how to interpret the notation. Experience shows that humans tend to prefer a bit more complexity in their syntax if there is something to be gained from it. I think it’s worthwhile to add a few more rules in exchange for the option to use infix notation in selected areas, as long as the resulting notation is homoiconic and the total number of rules is kept small.”

Curly-infix-expressions are designed so that they can work on other Lisps as well, which should simplify adoption elsewhere.

Curly-infix-expression notation is an unusually simple mechanism, but like much of any Lisp-based language, its power comes from its simplicity.

Reference implementation

The implementation below is portable, with the exception that Scheme provides no standard mechanism to override {...} in its built-in reader. Thus, implementations will typically have a modified reader that detects “{“, starts reading a list until its matching “}”, and then calls process-curly defined below. Implementations should always do this, but an implementation that complies with this SRFI must at least activate this behavior when they read the #!curly-infix marker followed by whitespace.

This reference implementation is SRFI type 2: “A mostly-portable solution that uses some kind of hooks provided in some Scheme interpreter/compiler. In this case, a detailed specification of the hooks must be included so that the SRFI is self-contained.”

For clarity, this is split into two parts: (1) code that implements the SRFI, and (2) a demo (with support procedures) to show its use. This SRFI is trivial to implement, so most of the code is actually in part 2.

Key code to implement this SRFI

  ; ------------------------------
  ; Curly-infix support procedures
  ; ------------------------------

  ; Return true if lyst has an even # of parameters, and the (alternating)
  ; first parameters are "op".  Used to determine if a longer lyst is infix.
  ; If passed empty list, returns true (so recursion works correctly).
  (define (even-and-op-prefix? op lyst)
    (cond
      ((null? lyst) #t)
      ((not (pair? lyst)) #f)
      ((not (equal? op (car lyst))) #f) ; fail - operators not the same
      ((not (pair? (cdr lyst)))  #f) ; Wrong # of parameters or improper
      (#t   (even-and-op-prefix? op (cddr lyst))))) ; recurse.

  ; Return true if the lyst is in simple infix format
  ; (and thus should be reordered at read time).
  (define (simple-infix-list? lyst)
    (and
      (pair? lyst)           ; Must have list;  '() doesn't count.
      (pair? (cdr lyst))     ; Must have a second argument.
      (pair? (cddr lyst))    ; Must have a third argument (we check it
                             ; this way for performance)
      (even-and-op-prefix? (cadr lyst) (cdr lyst)))) ; true if rest is simple

  ; Return alternating parameters in a lyst (1st, 3rd, 5th, etc.)
  (define (alternating-parameters lyst)
    (if (or (null? lyst) (null? (cdr lyst)))
      lyst
      (cons (car lyst) (alternating-parameters (cddr lyst)))))

  ; Not a simple infix list - transform it.  Written as a separate procedure
  ; so that future experiments or SRFIs can easily replace just this piece.
  (define (transform-mixed-infix lyst)
     (cons '$nfx$ lyst))

  ; Given curly-infix lyst, map it to its final internal format.
  (define (process-curly lyst)
    (cond
     ((not (pair? lyst)) lyst) ; E.G., map {} to ().
     ((null? (cdr lyst)) ; Map {a} to a.
       (car lyst))
     ((and (pair? (cdr lyst)) (null? (cddr lyst))) ; Map {a b} to (a b).
       lyst)
     ((simple-infix-list? lyst) ; Map {a OP b [OP c...]} to (OP a b [c...])
       (cons (cadr lyst) (alternating-parameters lyst)))
     (#t  (transform-mixed-infix lyst))))


  ; ------------------------------------------------
  ; Key procedures to implement neoteric-expressions
  ; ------------------------------------------------

  ; Read the "inside" of a list until its matching stop-char, returning list.
  ; stop-char needs to be closing paren, closing bracket, or closing brace.
  ; This is like read-delimited-list of Common Lisp.
  ; This implements a useful extension: (. b) returns b.
  (define (my-read-delimited-list my-read stop-char port)
    (let*
      ((c   (peek-char port)))
      (cond
        ((eof-object? c) (read-error "EOF in middle of list") '())
        ((eqv? c #\;)
          (consume-to-eol port)
          (my-read-delimited-list my-read stop-char port))
        ((my-char-whitespace? c)
          (read-char port)
          (my-read-delimited-list my-read stop-char port))
        ((char=? c stop-char)
          (read-char port)
          '())
        ((or (eq? c #\)) (eq? c #\]) (eq? c #\}))
          (read-char port)
          (read-error "Bad closing character"))
        (#t
          (let ((datum (my-read port)))
            (cond
               ((eq? datum '.)
                 (let ((datum2 (my-read port)))
                   (consume-whitespace port)
                   (cond
                     ((eof-object? datum2)
                      (read-error "Early eof in (... .)\n")
                      '())
                     ((not (eqv? (peek-char port) stop-char))
                      (read-error "Bad closing character after . datum"))
                     (#t
                       (read-char port)
                       datum2))))
               (#t
                   (cons datum
                     (my-read-delimited-list my-read stop-char port)))))))))


  ; Implement neoteric-expression's prefixed (), [], and {}.
  ; At this point, we have just finished reading some expression, which
  ; MIGHT be a prefix of some longer expression.  Examine the next
  ; character to be consumed; if it's an opening paren, bracket, or brace,
  ; then the expression "prefix" is actually a prefix.
  ; Otherwise, just return the prefix and do not consume that next char.
  ; This recurses, to handle formats like f(x)(y).
  (define (neoteric-process-tail port prefix)
      (let* ((c (peek-char port)))
        (cond
          ((eof-object? c) prefix)
          ((char=? c #\( ) ; Implement f(x)
            (read-char port)
            (neoteric-process-tail port
                (cons prefix (my-read-delimited-list neoteric-read-real #\) port))))
          ((char=? c #\[ )  ; Implement f[x]
            (read-char port)
            (neoteric-process-tail port
                  (cons '$bracket-apply$
                    (cons prefix
                      (my-read-delimited-list neoteric-read-real #\] port)))))
          ((char=? c #\{ )  ; Implement f{x}
            (read-char port)
            (neoteric-process-tail port
              (let ((tail (process-curly
                      (my-read-delimited-list neoteric-read-real #\} port))))
                (if (eqv? tail '())
                  (list prefix) ; Map f{} to (f), not (f ()).
                  (list prefix tail)))))
          (#t prefix))))


  ; To implement neoteric-expressions, modify the reader so
  ; that [] and {} are also delimiters, and make the reader do this:
  ; (let* ((prefix
  ;           read-expression-as-usual
  ;       ))
  ;   (if (eof-object? prefix)
  ;     prefix
  ;     (neoteric-process-tail port prefix)))

  ; Modify the main reader so that [] and {} are also delimiters, and so
  ; that when #\{ is detected, read using my-read-delimited-list
  ; any list from that port until its matching #\}, then process
  ; that list with "process-curly", like this:
  ;   (process-curly (my-read-delimited-list #\} port))

Demo code

  ; ------------------------------------------------
  ; Demo procedures to implement curly-infix and neoteric readers
  ; ------------------------------------------------

  ; This implements an entire reader, as a demonstration, but if you can
  ; update your existing reader you should just update that instead.
  ; This is a simple R5RS reader, with a few minor (common) extensions.
  ; The "my-read" is called if it has to recurse.
  (define (underlying-read my-read port)
    (let* ((c (peek-char port)))
      (cond
        ((eof-object? c) c)
        ((char=? c #\;)
          (consume-to-eol port)
          (my-read port))
        ((my-char-whitespace? c)
          (read-char port)
          (my-read port))
        ((char=? c #\( )
          (read-char port)
          (my-read-delimited-list my-read #\) port))
        ((char=? c #\[ )
          (read-char port)
          (my-read-delimited-list my-read #\] port))
        ((char=? c #\{ )
          (read-char port)
          (process-curly
            (my-read-delimited-list neoteric-read-real #\} port)))
        ; Handle missing (, [, { :
        ((char=? c #\) )
          (read-char port)
          (read-error "Closing parenthesis without opening")
          (my-read port))
        ((char=? c #\] )
          (read-char port)
          (read-error "Closing bracket without opening")
          (my-read port))
        ((char=? c #\} )
          (read-char port)
          (read-error "Closing brace without opening")
          (my-read port))
        ((char=? c #\") ; Strings are delimited by ", so can call directly
          (default-scheme-read port))
        ((char=? c #\')
          (read-char port)
          (list 'quote (my-read port)))
        ((char=? c #\`)
          (read-char port)
          (list 'quasiquote (my-read port)))
        ((char=? c #\,)
          (read-char port)
            (cond
              ((char=? #\@ (peek-char port))
                (read-char port)
                (list 'unquote-splicing (my-read port)))
              (#t
                (list 'unquote (my-read port)))))
        ((ismember? c digits) ; Initial digit.
          (read-number port '()))
        ((char=? c #\#) (process-sharp my-read port))
        ((char=? c #\.) (process-period port))
        ((or (char=? c #\+) (char=? c #\-))  ; Initial + or -
          (read-char port)
          (if (ismember? (peek-char port) digits)
            (read-number port (list c))
            (string->symbol (fold-case-maybe port
              (list->string (cons c
                (read-until-delim port neoteric-delimiters)))))))
        (#t ; Nothing else.  Must be a symbol start.
          (string->symbol (fold-case-maybe port
            (list->string
              (read-until-delim port neoteric-delimiters))))))))

  (define (curly-infix-read-real port)
    (underlying-read curly-infix-read-real port))

  (define (curly-infix-read . port)
    (if (null? port)
      (curly-infix-read-real (current-input-port))
      (curly-infix-read-real (car port))))

  ; Here's a real neoteric reader.
  ; The key part is that it implements [] and {} as delimiters, and
  ; after it reads in some datum (the "prefix"), it calls
  ; neoteric-process-tail to see if there's a "tail".
  (define (neoteric-read-real port)
    (let* ((prefix (underlying-read neoteric-read-real port)))
      (if (eof-object? prefix)
        prefix
        (neoteric-process-tail port prefix))))

  (define (neoteric-read . port)
    (if (null? port)
      (neoteric-read-real (current-input-port))
      (neoteric-read-real (car port))))


  ; ------------------
  ; Support procedures
  ; ------------------

  (define digits '(#\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9))
  (define linefeed (integer->char #x000A))        ; #\newline aka \n.
  (define carriage-return (integer->char #x000D)) ; \r.
  (define tab (integer->char #x0009))
  (define line-tab (integer->char #x000b))
  (define form-feed (integer->char #x000c))
  (define line-ending-chars (list linefeed carriage-return))
  (define whitespace-chars
    (list tab linefeed line-tab form-feed carriage-return #\space))

  ; Should we fold case of symbols by default?
  ; #f means case-sensitive (R6RS); #t means case-insensitive (R5RS).
  ; Here we'll set it to be case-sensitive, which is consistent with R6RS
  ; and guile, but NOT with R5RS.  Most people won't notice, I
  ; _like_ case-sensitivity, and the latest spec is case-sensitive,
  ; so let's start with #f (case-sensitive).
  ; This doesn't affect character names; as an extension,
  ; we always accept arbitrary case for them, e.g., #\newline or #\NEWLINE.
  (define foldcase-default #f)

  ; Returns a true value (not necessarily #t) if char ends a line.
  (define (char-line-ending? char) (memq char line-ending-chars))

  ; Returns true if item is member of lyst, else false.
  (define (ismember? item lyst)
     (pair? (member item lyst)))

  ; Create own version, in case underlying implementation omits some.
  (define (my-char-whitespace? c)
    (or (char-whitespace? c) (ismember? c whitespace-chars)))

  ; If fold-case is active on this port, return string "s" in folded case.
  ; Otherwise, just return "s".  This is needed to support our
  ; foldcase-default configuration value when processing symbols.
  ; The "string-foldcase" procedure isn't everywhere,
  ; so we use "string-downcase".
  (define (fold-case-maybe port s)
    (if foldcase-default
      (string-downcase s)
      s))

  (define (consume-to-eol port)
    ; Consume every non-eol character in the current line.
    ; End on EOF or end-of-line char.
    ; Do NOT consume the end-of-line character(s).
    (let ((c (peek-char port)))
      (cond
        ((not (or (eof-object? c)
                  (char-line-ending? c)))
          (read-char port)
          (consume-to-eol port)))))

  (define (consume-whitespace port)
    (let ((char (peek-char port)))
      (cond
        ((eof-object? char) char)
        ((eqv? char #\;)
          (consume-to-eol port)
          (consume-whitespace port))
        ((my-char-whitespace? char)
          (read-char port)
          (consume-whitespace port)))))

  ; Identifying the list of delimiter characters is harder than you'd think.
  ; This list is based on R6RS section 4.2.1, while adding [] and {},
  ; but removing "#" from the delimiter set.
  ; NOTE: R6RS has "#" has a delimiter.  However, R5RS does not, and
  ; R7RS probably will not - http://trac.sacrideo.us/wg/wiki/WG1Ballot3Results
  ; shows a strong vote AGAINST "#" being a delimiter.
  ; Having the "#" as a delimiter means that you cannot have "#" embedded
  ; in a symbol name, which hurts backwards compatibility, and it also
  ; breaks implementations like Chicken (has many such identifiers) and
  ; Gambit (which uses this as a namespace separator).
  ; Thus, this list does NOT have "#" as a delimiter, contravening R6RS
  ; (but consistent with R5RS, probably R7RS, and several implementations).
  ; Also - R7RS draft 6 has "|" as delimiter, but we currently don't.
  (define neoteric-delimiters
     (append (list #\( #\) #\[ #\] #\{ #\}  ; Add [] {}
                   #\" #\;)                 ; Could add #\# or #\|
             whitespace-chars))

  (define (read-until-delim port delims)
    ; Read characters until eof or a character in "delims" is seen.
    ; Do not consume the eof or delimiter.
    ; Returns the list of chars that were read.
    (let ((c (peek-char port)))
      (cond
         ((eof-object? c) '())
         ((ismember? c delims) '())
         (#t (cons (read-char port) (read-until-delim port delims))))))

  (define (read-error message)
    (display "Error: ")
    (display message)
    (display "\n")
    '())

  (define (read-number port starting-lyst)
    (string->number (list->string
      (append starting-lyst
        (read-until-delim port neoteric-delimiters)))))

  ; detect #| or |#
  (define (nest-comment port)
    (let ((c (read-char port)))
      (cond
        ((eof-object? c))
        ((char=? c #\|)
          (let ((c2 (peek-char port)))
            (if (char=? c2 #\#)
                (read-char port)
                (nest-comment port))))
        ((char=? c #\#)
          (let ((c2 (peek-char port)))
            (if (char=? c2 #\|)
                (begin
                  (read-char port)
                  (nest-comment port)))
            (nest-comment port)))
        (#t
          (nest-comment port)))))

  (define (process-sharp my-read port)
    ; We've peeked a # character.  Returns what it represents.
    (read-char port) ; Remove #
    (cond
      ((eof-object? (peek-char port)) (peek-char port)) ; If eof, return eof.
      (#t
        ; Not EOF. Read in the next character, and start acting on it.
        (let ((c (read-char port)))
          (cond
            ((char-ci=? c #\t)  #t)
            ((char-ci=? c #\f)  #f)
            ((ismember? c '(#\i #\e #\b #\o #\d #\x
                            #\I #\E #\B #\O #\D #\X))
              (read-number port (list #\# (char-downcase c))))
            ((char=? c #\( )  ; Vector.
              (list->vector (my-read-delimited-list my-read #\) port)))
            ((char=? c #\\) (process-char port))
            ; This supports SRFI-30 #|...|#
            ((char=? c #\|) (nest-comment port) (my-read port))
            ; If #!xyz, consume xyz and recurse.
            ; In a real reader, consider handling "#! whitespace" per SRFI-22,
            ; and consider "#!" followed by / or . as a comment until "!#".
            ((char=? c #\!) (my-read port) (my-read port))
            (#t (read-error "Unsupported # extension")))))))

  (define (process-period port)
    ; We've peeked a period character.  Returns what it represents.
    (read-char port) ; Remove .
    (let ((c (peek-char port)))
      (cond
        ((eof-object? c) '.) ; period eof; return period.
        ((ismember? c digits)
          (read-number port (list #\.)))  ; period digit - it's a number.
        (#t
          ; At this point, Scheme only requires support for "." or "...".
          ; As an extension we can support them all.
          (string->symbol
            (fold-case-maybe port
              (list->string (cons #\.
                (read-until-delim port neoteric-delimiters)))))))))

  (define (process-char port)
    ; We've read #\ - returns what it represents.
    (cond
      ((eof-object? (peek-char port)) (peek-char port))
      (#t
        ; Not EOF. Read in the next character, and start acting on it.
        (let ((c (read-char port))
              (rest (read-until-delim port neoteric-delimiters)))
          (cond
            ((null? rest) c) ; only one char after #\ - so that's it!
            (#t
              (let ((rest-string (list->string (cons c rest))))
                (cond
                  ; Implement R6RS character names, see R6RS section 4.2.6.
                  ; As an extension, we will ALWAYS accept character names
                  ; of any case, no matter what the case-folding value is.
                  ((string-ci=? rest-string "space") #\space)
                  ((string-ci=? rest-string "newline") #\newline)
                  ((string-ci=? rest-string "tab") tab)
                  ((string-ci=? rest-string "nul") (integer->char #x0000))
                  ((string-ci=? rest-string "alarm") (integer->char #x0007))
                  ((string-ci=? rest-string "backspace") (integer->char #x0008))
                  ((string-ci=? rest-string "linefeed") (integer->char #x000A))
                  ((string-ci=? rest-string "vtab") (integer->char #x000B))
                  ((string-ci=? rest-string "page") (integer->char #x000C))
                  ((string-ci=? rest-string "return") (integer->char #x000D))
                  ((string-ci=? rest-string "esc") (integer->char #x001B))
                  ((string-ci=? rest-string "delete") (integer->char #x007F))
                  ; Additional character names as extensions:
                  ((string-ci=? rest-string "ht") tab)
                  ((string-ci=? rest-string "cr") (integer->char #x000d))
                  ((string-ci=? rest-string "bs") (integer->char #x0008))
                  (#t (read-error "Invalid character name"))))))))))


  ; Record the original read location, in case it's changed later:
  (define default-scheme-read read)

  ; --------------
  ; Demo of reader
  ; --------------

  ; repeatedly read in curly-infix and write traditional s-expression.
  (define (process-input)
    (let ((result (curly-infix-read)))
      (cond
        ((not (eof-object? result))
          (write result)
          (display "\n")
          ; (force-output) ; flush, so can interactively control something else
          (process-input)))))

  (process-input)

References

The readable project website has more information: http://readable.sourceforge.net

Acknowledgments

We thank all the participants on the “readable-discuss” and “SRFI-105” mailing lists, including John Cowan, Shiro Kawai, Per Bothner, Mark H. Weaver, and many others whose names should be here but aren’t.

Copyright

Copyright (C) 2012 David A. Wheeler and Alan Manuel K. Gloria. All Rights Reserved.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


Editor: Mike Sperber