by Lassi Kortela
This SRFI is currently in withdrawn status. Here is
an
explanation of each status that a SRFI can hold. To provide
input on this SRFI, please send email to srfi-243@nospamsrfi.schemers.org
.
To subscribe to the list, follow these
instructions. You can access previous messages via the
mailing list archive.
This SRFI suggests how the Scheme reader and writer should handle unreadable data in general, and unreadable objects in particular.
Lisp code is represented as data. A Lisp system can be asked to write any live object as an S-expression. However, it’s inevitable that some of those objects have complex environmental dependencies which are difficult or impossible to write down.
The prototypical example of such an object is a port. Other common examples are procedures, continuations, promises, parameters, environments, and libraries. Objects managed by a foreign function interface tend to be unreadable. Additionally, objects that stand in for end-of-file and unspecified values are commonly written as unreadable objects since it makes little sense to read them.
Common Lisp reserves the lexical syntax
#<...>
for unreadable data.
Apart from being de jure standard in Common Lisp, this syntax is de facto standard in Scheme.
For example, Chicken writes a port as #<output port
"(stdout)">
.
MIT Scheme and STklos use square brackets #[...]
instead of angle brackets.
For example, MIT Scheme writes a port as
#[textual-i/o-port 12 for console]
.
The syntaxes #<...>
and #[...]
look as if the brackets are intended to delimit an S-expression
like parentheses delimit lists. But the delimiters tend to be
illusory.
Angle brackets are not delimiters in any Scheme implementation
nor in Common Lisp. They are identifier characters used in the
names of well-known procedures such as <
and
string>=
.
Scheme implementations accepting square brackets as list
delimiters are common but far from universal, and these
implementations tend to use #<...>
syntax for
unreadable data. STklos is the only known implementation which
both accepts [...]
lists and uses
#[...]
for unreadable data.
The use of identifier characters like <
>
as delimiters implies the writer will not
output a well-formed S-expression. The standard Scheme
write
procedure is meant to output an internally
consistent structure whereas display
may take more
liberties. The use of ill-formed syntax to write unreadable
objects suits the spirit of display
but not the
spirit of write
.
When one or more unreadable objects are nested in an otherwise readable structure, and are written using ill-formed syntax, the reader cannot recover any part of the structure. Nor can it recover subsequent structures from the same port. This will cause difficulties when S-expressions become more pervasive. As people work with larger and more heterogeneous expressions in a wider variety of contexts, it will be inconvenient to ensure that all written expressions consist only of readable objects.
In this SRFI we talk about unreadable data in general and unreadable objects in particular.
An unreadable object is any object written as a well-formed S-expression such that the reader cannot recover the original object, but can recover a stand-in object representing the original.
For example, assume an implementation which can parse the
lexical syntax #[primitive append]
and extract a
list of two symbols, primitive
and
append
. The syntax represents an unreadable object
(presumably the standard procedure append
) and the
list is the stand-in for the original object.
Since the syntax is well-formed, the reader can keep reading past the unreadable object to recover more objects (if there are any). The reader can also handle structures containing any mix of readable and unreadable objects nested to an arbitrary depth. For example, the following structure is lightly adapted from Chez Scheme.
#[transcoded-port utf8-codec #[buffered-port #[binary-output-port stdout]]]
To differentiate between readable and unreadable objects in nested structures, this SRFI introduces a special data type for unreadable objects. (The data type simply encapsulates the stand-in object.)
The syntax #<...>
is deeply entrenched in
Lisp and Scheme culture and cannot be rooted out in any
reasonable amount of time. This SRFI tries to broker peace by not
dictating any particular syntax. Implementations are free to keep
using traditional syntax.
Unfortunately the #<...>
syntax does not
meet the requirements given above for unreadable
objects. For example, the Common Lisp specification
says:
#<
is not valid reader syntax. The Lisp reader will signal an error of typereader-error
on encountering#<
. This syntax is typically used in the printed representation of objects that cannot be read back in.
Scheme implementations behave similarly: The reader simply
discards all text following the marker #<
.
We accommodate syntaxes of this kind by providing the minimal guarantee that the implementation stops reading after encountering the unreadable data marker and raises an exception. The exception handler may read the rest of the data from the port as unstructured text if the programmer so chooses.
Current editions of RnRS say that read
and
write
use textual ports. There is no fundamental
reason for this restriction. Binary S-expressions have been
demonstrated to work well, and could be accessed via the same
programming interface as textual S-expressions. In fact, even
non-S-expression formats such as JSON and ASN.1 could share the
same interface.
Consequently the specification in this SRFI avoids talking about text. Instead it talks about data which can be either text or bytes.
It makes sense for a large Scheme implementation to support more than one variant of S-expression syntax or even to have a programmable reader and writer. There is no existing standard, but there is a consensus that the best approach is to attach a lexical syntax to each port object. This way different ports can use different syntax without getting mixed up.
This SRFI addresses the situation by requiring
read
and write
to mind which port they
are dealing with, and to use the appropriate syntax (if any) for
unreadable data on that port.
(unreadable-object? object) => boolean
Return #t
if object stands in for an
unreadable object. Else return #f
.
(unreadable-object stand-in) =>
unreadable-object
Make an unreadable object using the given stand-in, which can be any object.
(unreadable-object-stand-in unreadable-object) =>
stand-in
Return the stand-in of unreadable-object.
(read [port]) => object
This RnRS procedure is expanded to account for unreadable data. (Similar modifications should be made to other procedures that read objects from ports.)
When encountering a top-level object that contains one or more
unreadable objects, or is itself an unreadable object, a
unreadable-error
is raised and the
unreadable-error-object
is the offending top-level
object. The top-level object has been encoded such that each
unreadable object in it (including the top-level object itself,
if it is unreadable) is wrapped in the
unreadable-object
data type. The port position lies
directly after the top-level object. It is unspecified whether
atmosphere (whitespace and comments) following the top-level
object have been consumed. The programmer may attempt to read
more objects from the port.
When encountering unreadable data that is not an unreadable
object, an unreadable-error
is raised and the
unreadable-error-object
is #f
. The port
position lies immediately after the marker which indicates the
start of unreadable data. For example, in the case of a textual
port for which the unreadable data marker is #<
the next read-char
will read the character
immediately following the <
. In general, it does
not make sense to attempt to read objects from the port at this
point.
(unreadable-error? object) => boolean
Return #t
if object is an unreadable
data error. Else return #f
.
All unreadable data errors also satisfy the RnRS
read-error?
predicate. In other words, unreadable
data errors are a subtype of read errors.
(unreadable-error-object error) => object
If all unreadable data were encoded as stand-in objects which the implementation was able to read, return the top-level object containing those unreadable objects. The top-level object may or may not itself be an unreadable object.
If the unreadable data was not encoded as a stand-in, the
return value is #f
.
(write object [port])
This RnRS procedure is expanded to account for unreadable
data. (Similar modifications should be made to
display
, write-shared
,
write-simple
, and other procedures that write
objects to ports.)
If object satisfies unreadable-object?
or is some other type of unreadable object for which a stand-in
can be generated, then the stand-in is written to port
using an implementation-defined lexical syntax. The syntax may
vary based on implementation-defined settings attached to
port.
For example, the stand-in (procedure append)
— a
list of the two symbols procedure
and
append
— could represent a procedure and could be
written using the syntax #[procedure append]
.
If object cannot be written to port for
syntax reasons then an exception satisfying
unwritable-error?
is raised.
(unwritable-error? object) => boolean
Return #t
if object is an unwritable
object error. Else return #f
.
Possible causes for the error include the following.
(unwritable-error-object error) => object
Returns the original object for which no stand-in could be written.
If one or more unwritable objects are nested within an
otherwise writable object, it is unspecified whether the object
returned by unwritable-error-object
is the top-level
object or one of the nested unwritable objects.
In implementations with an R6RS-style condition system, it is
recommended that the condition types &unreadable
and &unwritable
be defined.
The implementation should write well-formed unreadable objects instead of ill-formed unreadable data whenever feasible.
The unreadable object syntax should closely resemble the syntax for ordinary lists, for example by using parentheses or square brackets with a suitable prefix.
The stand-in should be a list.
The first element of the list should be a symbol. Known symbols are tracked in the Scheme Registry.
Assume an implementation that reads and writes unreadable
objects using the lexical syntax #[...]
such that
the [...]
part represents a list.
Then the following code snippet:
(guard (err (unreadable-error? (let* ((top (unreadable-error-object err)) (outer (unreadable-object-stand-in (list-ref top 5))) (inner (unreadable-object-stand-in (list-ref outer 2)))) (define (wr msg obj) (display msg) (write obj) (newline)) (wr "inner stand-in: " inner) (wr "outer stand-in: " outer) (wr "top object: " top)))) (read-from-string "(here is an unreadable object #[1 2 #[3 4 5]])"))
will display the following output:
inner stand-in: (3 4 5) outer stand-in: (1 2 #[3 4 5]) top object: (here is an unreadable object #[1 2 #[3 4 5]])
Two sample implementations are provided.
In general, the following implementation strategy is recommended.
unreadable-object
data type and the wrapped object
is stored in the structure being built. Keep in mind that
unreadable objects can be nested.unreadable-error
exception whose
unreadable-error-object
is the offending top-level
object. If the flag is not set, it returns the object as
usual.unreadable-error
exception whose
unreadable-error-object
is #f
without
trying to read anything more.Thanks to Marc Nieper-Wißkirchen for a thorough discussion of
the problems in parsing #<...>
.
The Common Lisp HyperSpec, section 2.4.8.20 (Sharpsign Less-Than-Sign)
© 2022, 2023 Lassi Kortela.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.