Title

#, external form

Author

Oleg Kiselyov

Status

This SRFI is currently in final status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-10@nospamsrfi.schemers.org. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.

Related SRFIs

As an extensible external notation the proposed #,() form is intended to be specialized in other SRFIs, especially those that introduce new types of Scheme values. One example is SRFI-4, "Homogeneous numeric vector datatypes". The external representation for float vectors was in "minor nonconformance to Standard Scheme." The present SRFI avoids this conflict. This SRFI can also solve a problem of representing a 2D matrix with a zero dimension, mentioned in a SRFI-4 discussion archive. Incidentally, SRFI-4 discussion was the inspiration for the present proposal.

This SRFI discusses and emulates reader-macros and specific sharpsign forms of Common Lisp.

Abstract

The present SRFI proposes an extensible external representation of Scheme values, a notational convention for future SRFIs. This SRFI adds #,( as a new token and extends production rules of the grammar for a Scheme reader. The #,() form can be used for example to denote values that do not have a convenient printed representation, as well for conditional code compilation. It is proposed that future SRFIs that contain new read syntax for values use the #,() notation with an appropriate tag symbol.

As a particular example and the reference implementation for the #,() convention, this SRFI describes an interpretation of the #,() external form as a read-time application.

Issues

Rationale

External representations of booleans, numbers, lists, vectors, strings are codified in RnRS. Every Scheme reader has an in-born knowledge how to parse the corresponding strings and build Scheme values they represent. The set of these built-in constructors is however fixed, and not amenable. This SRFI proposes to lift this limitation, to allow printed representation and serialization for ports, structures, wills, semaphores and other datatypes present in many Scheme systems as well as datatypes that may be introduced in future SRFIs. It has to be stressed however that the proposed external form can denote not only structural, record and esoteric data but standard Scheme values as well, whose canonical representation is cumbersome, dependent on external circumstances or otherwise unsuitable.

The #,() notation may appear in any character stream intended to be processed by a read procedure. For example, a #,() form can be used in a stream being read solely for data, as a source of initial values and other parameters for a specific algorithm. On the other hand, a character stream may contain Scheme code to be read and evaluated. In such streams #,() forms can denote either literal values or code to be conditionally compiled.

A future SRFI-X may introduce a new distinct type of Scheme values and intend the values of this type to be written and read. Any such SRFI has to extend the formal syntax of Scheme: R5RS Sections 7.1.1 and 7.1.2. The most common way of such an extension is

        "#" <discriminator> <other-char>*
where <discriminator> is one character other than '(', '\', 'i', 'e', 'b', 'o', 'x', 'd' and possibly 'f', '!' and 't'. There does not appear to be too much choice for the <discriminator>, especially if one wants to make it mnemonic.

SRFI-10 proposes another, systematic and expressive way of adding new external representations of scheme values, namely, via a #,(<tag> <datum>*) form. A particular SRFI-X that introduces a new data type should simply pick an appropriate <tag> (a symbol) and decide upon <datum>-arguments. The SRFI-X no longer has to add new productions to the external syntax, nor does it need to fight for the remaining characters that may be used as the <discriminator>. Implementations of the SRFI-X should support writing of values of the new datatype in the #,() form, and reading them back. The exact way of doing that is however up to the SRFI-X or its implementations.

The #,() notation is useful not only for new types of values. Existing Scheme datatypes can benefit from it as well: for example, cyclical lists and other structures with circular dependencies, data structures with large caches or memoization tables, etc. In another possible scenario one may introduce the following external forms:

        #,(pi) #,(epsilon) #,(Infinity) #,(NaN)
which represent rather useful (inexact) numbers.

Yet another application area for the proposed notation is "variable constants", e.g., #,(os-type). The Scheme reader would replace every instance of #,(os-type) with an appropriate literal, e.g., "Solaris", "HP-UX". Since this "binding" occurs very early, the corresponding string can be analyzed in various macros, syntax and other special forms. Another useful notation of the same kind is #,(srfi-features), which can be replaced by the list of feature identifiers the current implementation supports.

It is important to note the differences between #,(foo) and seemingly similar constructions (macro-expand-foo) and (force (delay (compute-foo)). The former is processed at read-time. Therefore it may meaningfully appear in input streams being read and processed solely as data. The other forms may only be used in input streams meant to be passed to a compiler or otherwise evaluated. Furthermore, an evaluator/macro-expander never sees #,(foo) forms. When a compiler/interpreter reads an expression any #,(foo) forms that may have occurred in the input stream will already be replaced by Scheme values they represent. This is similar to handling of abbreviations (R5RS 7.1.2), e.g., 'bar. An evaluator never comes across an apostrophe as a token: all the evaluator gets is a list of two elements, the first of them being quote. If #,(foo) happens to occur within a special form, that syntax will never see #,(foo) as it is -- it will see the corresponding literal. In contrast, (force (delay (compute-foo)) or (macro-expand-foo) used as arguments to a special form will be passed to that form as expressions rather than the values they evaluate/expand to.

Specification

The present SRFI extends the grammar for external representations -- R5RS Section 7.1.2 "External representations" -- as follows:

<datum> ---> <simple datum> | <compound datum> | <hash-comma-datum>

<hash-comma-datum> ---> "#,(" <tag> <datum>* ")"
<tag> ---> <symbol> | <hash-comma-datum>
Furthermore, the production for <token> in R5RS Sec. 7.1.1 is amended to read
<token> ---> <identifier> | <boolean> | <number> | <character> | <string>
           | ( | ) | #( | ' | ` | , | ,@ | . | #,(

This SRFI does not specify how the Scheme reader interprets a specific <hash-comma-datum> and creates the corresponding Scheme value. This is left to particular SRFIs. The behavior of the reader when it fails to interpret a #,() form is also unspecified.

Implementation

The Scheme reports do not define behavior of a reader in case the input stream does not follow the grammar for external representations (R5RS 7.1.2). In particular the reader is not required to report an error. The set of all input streams that can be read without a mandatory error reporting is the same no matter if SRFI-10 is in effect or not. In this sense, SRFI-10 does not change the language, therefore, SRFI-10 is already "implemented" in any existing Scheme system.

Read-time application

Read-time application is a particular implementation of SRFI-10, where the word "implementation" is taken in a more constructive sense. Read-time application is similar to Smalltalk's object serialization and #. and #S() reader-macros of Common Lisp. It interprets the <hash-comma-datum> of the amended grammar for external representations
"#,(" <tag> <datum>* ")"
where a tag must be an (external representation of an) identifier, and datum etc. are external representations of some values, which may be read-time applications as well. The read procedure should look up a reader-constructor associated with the tag, read the arguments datum... and apply the constructor to the arguments. The result of the application is taken to be the value that corresponds to the #,() external form. It is an error if a reader-constructor associated with the tag cannot be located.

There are several possible ways to specify an association between symbolic tags and constructor procedures. For example, an implementation may:

The present SRFI-10 reference implementation provides the define-reader-ctor function. The examples below assume this particular choice.

Comparisons

The read-time application mechanism is rather close to Lisp reader's macro functions. Unlike Common Lisp, however, a reader-constructor is not allowed to read from the input stream on its own. It may only build values from other values, which must have already been read and internalized. A reader-constructor must always return one value; it may not return "nothing". It may however throw an exception or simply return an "inappropriate" value such as #f, which will be caught later. Unlike compile-time function applications (that is, macro-expansions), a read-time application has no "second pass". Since a #,() form is completely read when a reader-constructor is called, we can relax a Common Lisp's prohibition on side-effects in the constructor. Unlike CL reader macro function, a reader-constructor cannot be called repeatedly during reading of a single, non-nested #,() form. Besides, prohibition on side effects is very difficult to enforce. Lifting it does not constitute encouragement of side-effecting reader-constructors however.

Common Lisp defines an external form #.obj, which instructs the Lisp reader to evaluate obj right after the reader parsed it. While #. is a general-purpose read-evaluator:
      #.obj === (eval obj)
#, is merely an application:
      #,obj === (apply (lookup (car obj)) (cdr obj))
and obj must be an external representation of a list. Read-time application is weaker than a generic read-time evaluation, and therefore less expensive. Example 5 below illustrates the difference.

Read-time applications are further restricted to only those procedures that have been specifically declared for that purpose (via define-reader-ctor). The user thus has a fine-grained control over which functions are being applied at read time. Performing arbitrary computations at read time is a security concern: for example, when an application server reads data from a request pipe, it does not want unexpected functions with potentially dangerous side-effects to be invoked. In Common Lisp one can avoid this undesirable behavior -- in an all-or-nothing-fashion -- by setting a distinguished global variable *read-eval* to nil. This however will globally turn off evaluation of all #. forms, even those which are deemed safe and appropriate. Registration of reader-constructors required in this reference implementation enables therefore a fine-grained control over read-time computations. Furthermore, the reference implementation registers reader-constructors in a readtable. Different invocations of the reader may execute or flag read-time applications differently depending on a readtable in effect.

Common Lisp defines a #S(typename . data) notation that constructs values at read-time yet requires prior registration of constructors. It happens only to be used for structures. This SRFI generalizes this notation, enabling read-time applications to compute arbitrary values, including S-expressions intended to be compiled/interpreted.

Examples

Example 1

Alternative printed representations for standard Scheme datatypes and other values

   (define-reader-ctor 'list list) 
   (with-input-from-string "#,(list 1 2 #f \"4 5\")" read) ==> (1 2 #f "4 5")

   (define-reader-ctor '+ +)
   (with-input-from-string "#,(+ 1 2)" read) ==> 3

Example 2

Readable representations for structures (records)
   (define-reader-ctor 'my-vector
     (lambda x (apply vector (cons 'my-vector-tag x))))
   (with-input-from-string "#,(my-vector (my-vector 1 2))" read) ==>
       a vector whose second element is a list of a symbol my-vector,
       number 1, and number 2.
   (with-input-from-string "#,(my-vector #,(my-vector 1 2))" read) ==>
       a vector whose second element is a my-vector constructed from
       numbers 1 and 2.
   (with-input-from-string "#,(my-vector #,(my-vector #,(+ 9 -4)))" read) ==>
       '#(my-vector-tag #(my-vector-tag 5))

Example 3

Representing uniform vectors (per SRFI-4) in the #, notation
   (define-reader-ctor 'f32 f32vector)
   (with-input-from-string "#,(f32 1.0 2.0 3.0)" read) ==>
       a uniform f32 vector with three elements

Example 4

External representations for 'complex' datatypes, eg, ports
   (define-reader-ctor 'file open-input-file)
   (with-input-from-string "#,(file \"/tmp/a\")"
     (lambda () (read-char (read)))
will return the first character of the file "/tmp/a"

Example 5

Differences between Common Lisp's #. and the proposed #,
   (with-input-from-string "#,(+ 1 (+ 2 3))" read) ==>
       error: can't add number 1 and a list '(+ 2 3)
   (with-input-from-string "#,(+ 1 #,(+ 2 3))" read) ==> 6
In contrast, in Common Lisp
   (with-input-from-string (is "#.(+ 1 (+ 2 3))") (read is))  ==> 6

Example 6

Loading of a file containing a read-time application: assuming a file "foo.scm" contains
   (define (temp-proc)
     (let ((v '#,(f32 1.0 2.0 3.0))) (f32vector-ref v 1)))
then
   (define-reader-ctor 'f32 f32vector)
   (load "foo.scm")
   (pp temp-proc) ==> 
       (lambda () (let ((v '#f32(1. 2. 3.))) (f32vector-ref v 1))) 
   (temp-proc) ==> 2.0

The full implementation and the validation code are available at

read-apply.scm
vread-apply.scm
The title comments in the validation code explain how to run it.

Implementation of a cond-expand form of SRFI-0 as a read-time application and the validation code are available at

cond-expand.scm
vcond-expand.scm

The implementation is directly based on a grammar given in SRFI-0. We assume that "feature identifiers" in effect are contained in a list ALL-FEATURES. This list should either be pre-defined in a Scheme system, or otherwise defined prior to reading the code in question. This implementation demonstrates how to customize a Gambit Scheme interpreter before it started evaluating user's code. The modification includes enabling SRFI-10, SRFI-0, and building a list of features. The reader-ctor code hopefully demonstrates that the changes to the reader required to implement cond-expand are minor and straightforward -- contrary to what SRFI-0 assumed.

Copyright

Copyright (C) Oleg Kiselyov (1999). All Rights Reserved.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


Author: Oleg Kiselyov oleg@okmij.org, oleg@acm.org, oleg@computer.org

Editor: Shriram Krishnamurthi