Title

Curly-infix-expressions

Authors

Alan Manuel K. Gloria

Status

This SRFI is currently in ``draft'' status. To see an explanation of each status that a SRFI can hold, see here. To provide input on this SRFI, please mail to <srfi minus 105 at srfi dot schemers dot org>. See instructions here to subscribe to the list. You can access previous messages via the archive of the mailing list.

Received: 2012/08/22
Draft: 2012/08/22-2012/10/22

None

Abstract

Lisp-based languages, like Scheme, are almost the only programming languages in modern use that do not support infix notation. Adding standardized infix support to Scheme would eliminate a common complaint by developers who currently choose to use other languages. Scheme currently reserves {...} “for possible future extensions to the language”. We propose that {...} be used to support “curly-infix” notation as a homoiconic infix abbreviation, as a modification of the Scheme reader. It is an abbreviation in much the same way that 'x is an abbreviation for (quote x).

A curly-infix list is a list whose visual presentation is in infix order instead of prefix order. For example, {n > 2} maps to (> n 2), and {a + b + c} maps to (+ a b c). By intent, there is no precedence, but e.g., {x + {y * z}} maps cleanly to (+ x (* y z)). Forms with mixed infix operators and other complications have “nfx” prepended to enable later macro processing, e.g., {2 + 3 * 5} maps to (nfx 2 + 3 * 5).

Note that this is the first of three tiers developed by the “readable” project. We intend to later submit two more SRFIs, to define two other tiers that build on top of curly-infix-expressions (namely, neoteric-expressions and sweet-expressions). However, curly-infix-expressions are useful on their own, even without these other notations.

Rationale

Lisp-based languages, like Scheme, are almost the only programming languages in modern use that do not support infix notation. Even some Lisp advocates, like Paul Graham, admit that they “don’t find prefix math expressions natural” (http://www.paulgraham.com/pypar.html). Paul Prescod has said, “I have more faith that you could convince the world to use Esperanto than prefix notation” (http://people.csail.mit.edu/gregs/ll1-discuss-archive-html/msg01571.html). Infix is not going away; standard mathematical notation uses infix, infix notation is taught to most people (programmers or not) in school, and nearly all new programming languages include infix. Adding infix support to Scheme would eliminate a common complaint by developers who currently choose to use other languages instead.

Scheme currently reserves {...} “for possible future extensions to the language”. We propose that {...} be used to support “curly-infix” notation as a reader abbreviation, just as 'x is an abbreviation for (quote x) and (x y z) is an abbreviation for (x . (y . (z . ()))).

This proposal is an extremely simple and straightforward technique for supporting infix notation. There is no complex precedence system, all other Scheme capabilities (including macros and quasiquoting) work unchanged, any symbol can be used as an infix operation where desired, and Scheme remains general and homoiconic. Curly-infix-expressions (also known as c-expressions) are just a convenient reader abbreviation for infix notation.

At its core, this SRFI provides the simple curly-infix list that accepts ordinary lists, merely written in a different order. The simple curly-infix list {operand-1 operator operand-2 operator operand-3 operator ...} is mapped to (operator operand-1 operand-2 operand-3 ...) so that more than two operands are handled cleanly. E.g., {a + b + c} maps to (+ a b c).

Why not macros or radically different reader notations?

Many previous systems have implemented “infix” systems as a named macro or procedure (e.g., INFIX). This looks ugly, and it does the wrong thing ‐ the resulting list always has INFIX at the beginning, not the actual infix operator, so this approach can interfere with quoting, macros, and other capabilities.

Other systems build notations into the reader, but the infix notation would be a notation radically different from normal Lisp notation. The result, in some cases, would be that these notations would simultaneously lose Lisp’s abilities for quoting, quasiquoting, and so on, and these notations were not homoiconic.

In contrast, this curly-infix-expression proposal avoids these problems. For example, in curly-infix, `{,a + ,b} maps cleanly to `(+ ,a ,b), which works as expected with all macros.

Why isn’t precedence part of this SRFI?

Many past “infix” systems for Lisp build in precedence. However, Lisp systems often process other languages, and they may freely mix these different languages. Thus, the same symbol may have different meanings and precedence levels in different contexts. The symbol might not even be defined where it is being used, and allowing precedence definitions would create subtle errors if files are read in a different order. If users hook in their own precedence system into a reader, it could even become difficult to combine code written for different precedence systems. In short, building precedence into a Lisp reader creates many complexities.

Yet the complexity of precedence systems is often unnecessary. In practice, we’ve found that simple infix is all that’s needed most of the time in Lisp-based languages. Even in other languages, many developers unnecessarily use grouping symbols with infix operators to make their order clear. Thus, requiring grouping symbols is less of a hardship than it might appear.

By intentionally not building a precedence system into the reader, a very simple yet useful infix system results. We don’t need to register procedures, ensure that declarations of precedence precede their use, or anything like it. We also ensure that the notation is clearly homoiconic.

Instead, where precedence is desired, application and library writers can implement precedence by defining and controlling the scope of an “nfx” macro or procedure, or by later postprocessing of that symbol. Scheme macros are already quite powerful and capable of handling this; in these cases, {...} provides a more convenient notation. The curly-infix approach, instead of trying to manage both infix and precedence, handles simple cases and then takes advantage of the existing Scheme scoping rules and macro system for more complex cases (in the rare cases where they are needed).

It would be possible to extend curly-infix to provide a full fixed precedence system (e.g., if an expression is mixed, attempt to use various precedence rules). However, such capabilities would be extensions not required by this SRFI. See the discussion on transform-mixed-infix, below.

Note that curly-infix includes support for unary operators, but again, they are without precedence. As a result, they must be grouped separately. This does not lead to hard-to-read expressions, however. Examples of simple curly-infix lists combining infix and unary operations include {(- x) * (- y)} and {{- x} * {- y}}. Another notation that builds on curly-infix (namely, neoteric-expressions) can make unary operators even easier to use.

At first David A. Wheeler, who started this project, considered reporting an error if a simple infix expression isn’t provided. However, prepending “nfx” is much more flexible.

Why not autodetect infix?

Some past efforts tried to automatically detect infix operators, but this turns out to not work well. It’s hard to express good rules for detecting infix operators, and the rules become too complex for users (e.g., “punctuation-only symbols” doesn’t detect “and” or “or”). And in any case, if they were automatically detected, an escape mechanism would be needed anyway. Allowing the user to expressly notate when infix was intended, using {...}, turns out to be far more clearer and more intuitive. In particular, curly-infix allows the use of infix with any symbol, whenever you want... and where it’s not convenient, you don’t need to use it. It is also very backwards-compatible: Normal lists work normally, and if you want infix, use {...}.

Why are 0, 1, and 2 parameters special?

The empty curly-infix list {} maps to (), as it is an empty list, and this is the likely user meaning (reducing unnecessary errors).

The one and two parameter cases are defined in part to reduce user error, and in part to better support other notations that build on top of curly-infix-expressions (namely, neoteric-expressions and sweet-expressions):

An “escaping” {e} is mapped to e so that the neoteric-expression f{x} becomes the likely-intended (f x), and to provide an easy escape mechanism in sweet-expressions for symbols that would otherwise have other meanings.
The “unary-operation” curly-infix list {e f} maps to (e f), so that {- x} maps to (- x), the likely interpretation, and also so that the neoteric-expression f{- x} will map correctly to (f (- x)).

What is `transform-mixed-infix` good for?

The “transform-mixed-infix” procedure is not intended for use in normal software development. Instead, it is suggested so that a future SRFI author could more easily implement a fixed precedence system that is built into the reader. We think that precedence is overrated. However, if a future community decides it wants precedence built into the reader, transform-mixed-infix could be redefined to provide one. Similarly, the mapping requirement for mixed curly-infix lists is intentionally worded to allow additional mappings of mixed curly-infix lists to handle precedence (so an implementation can legitimately say they implement curly-infix-expresssions even if they have such additional mappings). The mapping of mixed curly-infix-expressions and the “transform-mixed-infix” procedure are only defined to help simplify implementing a plausible future SRFI. The “transform-mixed-infix” procedure is not required or expected to be used in normal cases. Application software developers should avoid redefining transform-mixed-infix.

Why must infix operators be delimited?

Curly-infix requires that the infix operators be delimited (e.g., by spaces). This is consistent with Lisp history and current practice. Currently, in Lisp, operators are always delimited in traditional s-expressions (typically by left parentheses on the left, and by whitespace on the right). It’s impractical to do otherwise today; most Lisps, including Scheme, allow and predefine symbols that include characters (like “-”) that are typically used for infix operators. Many developers put space around infix operators even in languages that don’t require them, so syntactically requiring them is no burden. In short, it is difficult to allow infix operators without delimiters, and the visual results are the same as many real-world uses in other languages, so the result appears quite customary to typical software developers.

Why the `curly-foo` convention?

There needs to be a way for users to easily enable curly-infix for the default reader (e.g., read), read-eval-print loop (REPL), and loader (e.g., load). We would like implementations to enable curly-infix in their normal invocation, but some implementors may not want to do that. For example, if an implementation already uses braces for a different local extension, they may not want to immediately switch to curly-infix in their default invocation. Thus, if implementors choose to not enable curly-infix in their default reader, a conventional command line name “curly-foo” is defined for each implementation foo that enables it.

There is simply no single option flag (for example) that everyone could agree on to enable this. In practice, we expect that implementations will build this into their default readers and then control it via some special flag, but we do not want to mandate exactly how it is turned on or passed.

Why the procedures `enable-curly-infix` and `curly-infix-read`?

Early drafts of this SRFI required support for an “enable-curly-infix” procedure to enable curly-infix. But some implementations may have difficulty changing basic infrastructure (like the reader and loader) once they have started up. Thus, this procedure is not required, but simply may be available. This provides a simple convention for users who want to enable curly-infix, without leaving the system, on those systems where this is sensible.

There’s no explicit “disable-curly-infix”. There’s no reason to disable it within this SRFI, and if some other action changes the meaning of braces, it would be that other action that disables it.

The “curly-infix-read” procedure allows reading external curly-infix data, even if the implementation doesn’t enable curly-infix-expressions by default but does support curly-infix-expressions.

By intent, this SRFI (including the enabling mechanism) doesn’t use or interact with any module system at all (including the R6RS and R7RS module systems). This is because some implementations won’t have a module system (or at least not a standard one). Curly-infix is an intentionally simple mechanism that can be built into even trivial Scheme implementations. Mandating module support is unnecessary and might inhibit its adoption.

What about the Racket “infix convention”?

Racket allows a notation called the “infix convention” with the form “(a . operation . b)”. An advantage of this alternative is that it does not use the braces, so it might be easier to implement in Schemes which already define {...} in a local extension. However, the Racket “infix convention” has many problems:

The Racket infix convention is much longer and more awkward than curly-infix notation. In the Racket infix convention, every infix operator adds 6 characters. Infix operations are a very common operation, so convenience matters. An expression like (1 . + . 2) is far longer, and less convenient, than {1 + 2}.
It doesn’t look like other languages or math. A human notation should be maximally understandable to people given what they already know. The notation {a + b} is much more similar to the standard notation &ldquot;a + b&rdquot; than (a . + . b).
It is easy to make mistakes. If you forget a “.” somewhere, you end up with the wrong result, and possibly without an error being flagged. This notation also makes it harder to see improper lists; improper lists are important but rare, so it’s good to make them obvious - the Racket infix convention makes improper lists hard to distinguish. The Racket documentation even goes out of its way to emphasize that infix convention use is unrelated to improper lists... which suggests that they are easily confused.
Without {...}, {x} is no longer useful as an escape mechanism for sweet-expressions (a notation that builds on curly-infix). An alternative would be to use (. x) as an escape mechanism, but at that point dots-in-lists become busy and confusing.
Racket’s implementation, according to its documentation, does not allow multiple operations, e.g., (a . + . b . + . c . + . d). That could be added, but using them makes the Racket infix convention even more unwieldy. For example, compare (a . + . b . + . c . + . d) to {a + b + c + d}.
Even Racket users don’t use this convention often. Its documentation says that “Racket programmers use the infix convention sparingly—mostly for asymmetric binary operators such as < and is-a?”. The documentation does not say why, but its extra length and awkwardness may be part of the reason. In any case, the fact that the Racket documentation recommends that it be used “sparingly” does not encourage wider adoption.

In short, cases where infix notation would be useful are extremely common, so its notation should be convenient. The Racket “infix convention” may be the next-best notation for infix notation after curly-infix, but it’s next-best, and we should strive for the best available notation for such a common need. Curly-infix does not conflict with the Racket infix convention; implementations could implement both. We recommend that an implementation that implements the Racket infix convention should also allow multiple operands and use curly-infix semantics for them, pretending that . op . is a single parameter. In that case, (a . + . b . + . c) would map to (+ a b c), and (a . + . b . * . c) would map to (nfx a + b * c). Note that the existence of the Racket “infix convention” is additional evidence of the need for a standard infix convention; many have separately created mechanisms to try to provide infix support.

What about Gambit’s “Scheme Infix eXtension (SIX)”?

The Gambit reader includes a notation called the “Scheme Infix eXtension (SIX)” that supports infix notation. SIX expressions begin with a backslash.

Like curly-infix, SIX is a reader extension. But SIX has a number of problems compared to curly-infix:

SIX has a long, complex grammar, requiring users to remember a complex grammar, and requiring a far more complex implementation. Curly-infix, by contrast, is extremely simple.
SIX is inflexible. It only supports a fixed set of operators, and users cannot easily use other operators inside its notation. SIX even defines a small set of types and control structures which cannot be extended. People can easily create new operations; it’s important that an infix system be able to immediately use them.
SIX does not generate standard Scheme expressions, and thus it does not interact well with macros and quoting. SIX generates special markers when given infix expressions, and they must then be processed later for use. For example, in SIX, using an infix + in the input generates six.x+y, not +, in the read result. This means that macros may have to be specially designed to work with SIX. In contrast, in curly-infix, {x + y) actually generates {+ x y), so curly-infix works in a better way with other macros and with quoting rules.

Like Racket, SIX does demonstrate that there is a need for an infix notation that can be used in Scheme.

A system could simultanously implement curly-infix and SIX. However, curly-infix is far simpler, is more flexible (e.g., by allowing arbitary symbols), and works much more easily with macros and quoting. Thus, we believe that curly-infix is the better system and more appropriate for standardization across Scheme implementations.

What about Guile 1.4’s “reading infix” module?

Guile 1.4.x at gnuvola.org is self-described as a “(somewhat amicable) fork from the official Guile”. It includes support for reading infix expressions. Once activated, infix expressions are surrounded by #[ and ]. Infix are surrounded by whitespace. It supports precedence, which sounds like an advantage, but operators must be registered before use (and few are predefined), creating an opportunity for terrible errors if the expression is read first. There is also the opportunity for serious problems if different programs are written assuming different precedence levels. Inside the infix notation a very different language is used (e.g., parentheses are used for grouping instead of necessarily creating lists, and parameters are separated by commas), so it is unclear how well it would work with other Scheme features such as quasiquotation.

The guile 1.4 reading infix module has a more complex grammar requiring a more complex implementation and understanding. Its registration system creates serious problems when trying to use it for larger systems. This infix notation has not been accepted into the version of guile used by most people, so it is very much not portable. But perhaps the biggest problem is that this notation is fundamentally not homoiconic; it is harder to determine where lists begin and end with it.

In contrast, curly-infix is simpler, requires no registration system or other complexities, works more clearly with macros and quasiquotation, and has the general advantage of being homoiconic.

Other details

There is no requirement that writers (e.g., “write” or a pretty-printer) write out curly-infix-expressions. They may choose to do so, e.g., for lists of length 3-6 whose car is the symbol “and”, the symbol “or”, or a punctuation-only symbol. However, it would probably be wise to wait until many implementations can handle c-expressions.

Curly-infix is designed so that it can work on other Lisps as well. We even have a working implementation in Common Lisp.

Curly-infix is an unusually simple mechanism, but like much of any Lisp-based language, its power comes from its simplicity.

Specification

“Curly-infix-expressions” (c-expressions) are s-expressions with an additional notation: The curly-infix list. A curly-infix list is syntactically identical to a normal list, except it is surrounded by curly braces instead of by parentheses. Once a curly-infix list is read, it is mapped differently than a regular list by a curly-infix reader:

A simple curly-infix list has an odd number of parameters, at least three parameters, and all even parameters are “eq?” symbols. A simple curly-infix list is mapped by the reader into a list with the even parameter followed by the odd parameters. E.g., {n <= 2} maps to (<= n 2), and {2 * 3 * 4} maps to (* 2 3 4).
The empty curly-infix list {} is mapped to the empty list ().
An escaping curly-infix list {e} is mapped to e. E.g., {5} is mapped to 5.
A unary-operation curly-infix list {e1 e2} is mapped to (e1 e2). E.g., {- x} maps to (- x).
The mapping of a curly-infix list beginning with the symbol “.” is unspecified. (Note: the reference implementation maps {. e} to e if (. e) maps to e.)
Any other curly-infix list (including all other improper lists) is mixed. Unless overridden by some other specification, a mixed curly-infix list is mapped to the list with the symbol “nfx” added to its front. E.g., {q + r * s} is mapped to (nfx q + r * s), and {q + r . s} is mapped to (nfx q + r . s).

Here are some examples of c-expressions (note all operators in curly-infix are delimited):

{a * {b + c}} maps to (* a (+ b c))
{x eqv? `a} maps to (eqv? x `a)
{(- a) / b} maps to (/ (- a) b)
{(f a b) + (g h)} maps to (+ (f a b) (g h))
'{a + (f b) + x} maps to '(+ a (f b) x)
{{a > 0} and {b >= 1}} maps to (and (> a 0) (>= b 1))

A curly-infix reader is a datum reader that can correctly read and map curly-infix-expressions. A curly-infix reader must include the braces “{” and “}” as delimiters.

The “standard readers” are the datum reader used by the REPL, the datum readers defined by the relevant Scheme standards (such as “read” and where applicable “get-datum”), and the readers used to load user-supplied code as defined by the relevant Scheme standards (e.g., the reader used by “load” and module-loading mechanisms for user code).

The standard readers are curly-infix enabled if the standard readers are curly-infix readers.

An implementation of this SRFI must have its standard readers be curly-infix enabled. We encourage implementations’ default invocation to have their standard readers be curly-infix enabled, but this is not required. If the standard readers are not curly-infix enabled in an implementation’s default invocation, then if it can be invoked from a command line via the command “foo”, the implementation must provide an alternative command “curly-foo” (the command prefixed with “curly-”) in which its standard readers are curly-infix enabled. In addition, if the implementation is invokable as a graphical user interface (GUI), it must provide a documented means to ensure that its standard readers are curly-infix enabled.

When determining if a curly-infix list is simple or not, an implementation may allow the even parameters to be non-symbols, and it may use a more generous equality test (such as equal?); if it does, it should terminate even if the even parameters contain cycles. An implementation may perform a different mapping for a mixed curly-infix list (e.g., by implementing precedence using transform-mixed-infix as described below).

Procedures:

Implementations may provide the procedure (curly-infix-read . port) as a curly-infix reader.
Implementations may provide the procedure (enable-curly-infix), which on completion would ensure that the standard readers are curly-infix enabled. If implemented, this procedure must allow users to safely invoke (enable-curly-infix) repeatedly.
Implementations may provide an overrideable procedure (transform-mixed-infix lyst) that can be “set!”. This procedure would receive a mixed curly-infix list and return a transformation of it. By default, this function returns the original list but with “nfx” in front. Application developers should not use, define, or redefine this function; it is merely suggested to simplify implementation of plausible future SRFIs.

If an implementation supports SRFI 55 require-extension, then (require-extension (srfi 105)) should implement the functionality of enable-curly-infix. Similarly, if an implementation imports the module (srfi 105), then it should have the same effect as enable-curly-infix.

Note that, by definition, this SRFI modifies lexical syntax.

Reference implementation

The implementation below is portable, with the exception that Scheme provides no standard mechanism to override {...} in its built-in reader. Thus, implementations will typically have a modified reader that detects “{“, starts reading a list until its matching “}”, and then calls process-curly defined below. We recommend that implementations always do this, but an implementation must at least activate this behavior when “curly-foo” is invoked (if invoking foo on the command like would invoke the implementation without curly-infix). This is also enabled by (enable-curly-infix) if supported.

This reference implementation is SRFI type 2: “A mostly-portable solution that uses some kind of hooks provided in some Scheme interpreter/compiler. In this case, a detailed specification of the hooks must be included so that the SRFI is self-contained.”

  ; Return true if lyst has an even # of parameters, and the (alternating)
  ; first parameters are "op".  Used to determine if a longer lyst is infix.
  ; If passed empty list, returns true (so recursion works correctly).
  (define (even-and-op-prefix? op lyst)
    (cond
      ((null? lyst) #t)
      ((not (pair? lyst)) #f)
      ((not (eq? op (car lyst))) #f) ; fail - operators not the same
      ((not (pair? (cdr lyst)))  #f) ; Wrong # of parameters or improper
      (#t   (even-and-op-prefix? op (cddr lyst))))) ; recurse.

  ; Return true if the lyst is in simple infix format
  ; (and thus should be reordered at read time).
  (define (simple-infix-list? lyst)
    (and
      (pair? lyst)           ; Must have list;  '() doesn't count.
      (pair? (cdr lyst))     ; Must have a second argument.
      (pair? (cddr lyst))    ; Must have a third argument (we check it
                             ; this way for performance)
      (symbol? (cadr lyst))  ; 2nd parameter must be a symbol.
      (even-and-op-prefix? (cadr lyst) (cdr lyst)))) ; true if rest is simple

  ; Return alternating parameters in a list (1st, 3rd, 5th, etc.)
  (define (alternating-parameters lyst)
    (if (or (null? lyst) (null? (cdr lyst)))
      lyst
      (cons (car lyst) (alternating-parameters (cddr lyst)))))

  ; Not a simple infix list - transform it.  Written as a separate procedure
  ; so that future experiments or SRFIs can easily replace just this piece.
  (define (transform-mixed-infix lyst)
     (cons 'nfx lyst))

  ; Given curly-infix lyst, map it to its final internal format.
  (define (process-curly lyst)
    (cond
     ((not (pair? lyst)) lyst) ; E.G., map {} to ().
     ((null? (cdr lyst)) ; Map {a} to a.
       (car lyst))
     ((and (pair? (cdr lyst)) (null? (cddr lyst))) ; Map {a b} to (a b).
       lyst)
     ((simple-infix-list? lyst) ; Map {a OP b [OP c...]} to (OP a b [c...])
       (cons (cadr lyst) (alternating-parameters lyst)))
     (#t  (transform-mixed-infix lyst))))

  ; In the reader, when #\{ is detected, read (as a list) from that port
  ; until its matching #\}, then process that list with "process-curly".

References

The readable project website has more information: http://readable.sourceforge.net

Acknowledgments

We thank all the participants on the "readable-discuss" mailing list.

Copyright

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Editor: Mike Sperber