Per Bothner
<per@bothner.com>
This SRFI is currently in ``final'' status. To see an explanation of
each status that a SRFI can hold, see here.
To provide input on this SRFI, please
mail to
<srfi minus 108 at srfi dot schemers dot org>
. See
instructions here to
subscribe to the list. You can access previous messages via
the archive of the mailing list.
You can access
post-finalization messages via
the archive of the mailing list.
This specifies an extensible reader syntax for
named value constructors.
A reader prefix is followed by a tag
(an identifier),
and then expressions and literal text parameters.
The tag can be though of as a class name, and the expression and
literal text are arguments to an object constructor call.
The reader translates &tag{...}
to a list
($construct$:tag ...)
,
where $construct$:tag
is normally bound to a predefined macro.
This propsal depends on SRFI-109 (extended string quasi-literals) (in spite of having a lower number). It also shares quite of bit of syntax with SRFI-107 (XML reader syntax).
When adding new datatypes it is useful to add new literals of that type, or at least a compact readable notation for creating instances. SRFI-10 provided one solution. Here is an example, which assumes a URI type for representing encoded Uniform Resource Identifiers (URIs - or generalized URLs):
#,(URI "http://example.com/")
SRFI-10 has a number of problems. One issue
is that SRFI-10 conflicts with syntax-case
and R6RS.
More fundamentally SRFI-10 resolves the tag
name to a
constructor function at read time, which requires managing those
names using a distinct mechanism. It seems better to use normal
scope rules (including library import) to manage this mapping.
The reader is also responsible for calling the constructor,
which means the format handled by read
is extensible,
which is good, but you have to be careful about the security
implications, making sure only safe
constructor functions are called.
Finally, SRFI-10 doesn't integrate with quasi-quotation,
and some of us find its syntax a bit ugly.
Instead, this SRFI proposes:
&URI{http://example.com/}
The Scheme reader translates this to:
($construct$:URI "http://example.com/")The programmer must provide or import a macro or procedure definition of
$construct$:URI
that creates a URI value.
As related prior art, Caml p4 has an interesting quotation system.
(define example-host "example.com") &URI{http://&[example-host]/}
Note that &
is used both to introduce the top-level quasi-literal,
and as an escape character before the unquoted expression.
The above example is read as:
($construct$:URI "http://" $<<$ example-host $>>$ "/")The symbols
$<<$
and $>>$
are special
markersymbols bound to unique values. Their use allows the implementation of
$construct$:URI
to tell if content is part of the literal text or from an
enclosed expression - which is sometimes useful to know.
Note that enclosed expressions are commonly strings but not always. This example executes an SQL query with a numeric parameter:
&sql{select * from employees where salary > &[min-salary]}Even when the parameter is a string, simple string pasting may be wrong - or dangerous: Consider:
&sql{select * from employees where name = '&[my-name]'}
Consider what happens if name is constructed by a malicious
user and has the value: smith' or ''='
. In that case the effective condition
would be:
name = 'smith' or ''=''This evaluates to true so it would retrieve all
employees
.
The use of $<<$
and $>>$
enables the $construct$:sql
implementation
to do the necessary escaping of special characters in text
resulting from an evaluated expression.
&cname{&[exp1 exp2]text}you can write:
&cname[exp1 exp2]{text}
These are almost the same, but there is conceptual
difference: The latter variant is typically used for options
or XML-style attributes. The former variant
is used to list components
of the result object,
or children
in the XML sense.
Initial expressions can be used for keyword arguments - or
general non-string arguments.
Here is an example (converted from the Scribble documentation):
&elem[style: 'italic]{Yummy!}
Consider objects that are normally constructed from a string representation.
In that case one might want to concatenate the non-initial enclosed
expressions along with the literal text to yield the string,
while using initial arguments for keywords or non-string arguments.
Therefore the $construct$:cname
implementation
needs to be able to unambiguously select the initial arguments.
To do this, the first example in this section is read as
($construct$:cname $<<$ exp1 exp2 $>>$ "text")while the second is read as:
($construct$:cname exp1 exp2 $>>$ "text")i.e. without the initial
$<<$
symbol.
Commonly both expressions will evaluate to
the same value, but that is not required.
@cmd[datum ...]{text-body}This SRFI uses
&
instead of @
, to
be compatible with XML-literals, and also because the proposed syntax
is similar but not fully compatible with Scribble.
Non-compatible cases include @{foo bar}
which Scribble reads as ("foo bar")
,
while SRFI-109 instead defines this as the string
"foo bar"
.
Markup is commonly nested, which suggests that a
&
in text can be used
as an abbreviated extended-datum-literal.
Specifically:
&name1[exps]{abc&name2[exps]{klm}xyz}is syntatic sugar for:
&name1[exps]{abc&[&name2[exps]{klm}]xyz}
This nesting of markup motivates using the same escape character for both top-level and enclosed forms.
As shown, the Scheme reader translates a named quasi-literal to a list, which is then subject to regular macro-expansion and evaluation:
&tag{...}is read as if it were:
($construct$:tag ...)One can see this mapping by quoting the form:
'&tag{...} ⟹ ($construct$:tag ...)
The choice of the translation $construct$:tag
is somewhat arbitrary. We want it to be easy for programmers
to write, to be readable, and thus not excessively verbose.
We want the symbol to include the actual tag
as part of the name,
but using just tag
by itself is likely to lead to
awkward name clashes. (Of course it is perfectly
reasonable to implement $construct$:tag
using
a tag
function.)
Using colon to delimit the tag
part seems
readable and clean. Note there may be some complication
in a Scheme variant that uses colon as a package or
namespace separator, as for example Kawa does. However,
the problem is easily solved (at least in Kawa) by defining
$construct$
as a predefined namespace prefix.
When specifying this translation we have two semi-conflicting goals:
format
-specifiers.
In that case literal content text should get treated as part
of the format string, while a string literal in an enclosed expression
would be a value argument to format
(possibly with a
default format specifier in the format string), which matters
if argument re-positioning is supported.
Another example: The XML data model distinguishes text nodes
from atomic string values: literal text would evaluate to
text nodes, while enclosed string values are atomic values.
Finally, we may want to distinguish initial arguments.
For example, one might want to enforce a rule that
keyword arguments are only allowed in initial arguments.
$construct$:tag
.
It should be possible to define $construct$:tag
as a function, instead of a macro.
The translation uses a pair of special symbols to mark the start and end of the enclosed expressions:
($construct$:foo "s" $<<$ exp1 exp2 $>>$ "t")
This translation scores highly on information-preservation.
It also scores highly on implementation-ease in the simple case
where we can just ignore which expressions are enclosed and which
are literal. For example
if $construct$:foo
is defined in the simplest way possible:
(define $construct$:foo make-foo)then the example is equivalent to the call:
(make-foo "s" "" exp1 exp2 "" "t")
When you do a more complex translation, you may have to write a macro,
and dealing with $<<$
and $>>$
is not completely trivial,
Still, this seems a reasonable tradeoff; we later provide
a helper macro define-simple-constructor
to
simplify some common cases.
Note this convention lets us distinguish these cases (if you care):
&foo{_&bar{b}_} &foo{_&[&bar{b}]_}because these translate differently:
($construct$:foo "_" ($construct:bar "b") "_") ($construct$:foo "_" $<<$ ($construct:bar "b") $>>$ "_")
An earlier draft specified that translating an exclosed expression sequence:
&foo{s&[exp1 exp2]t}would use a
$unquote$
macro to indicate the expressions:
($construct$:foo "s" ($unquote$ exp1 exp2) "t")This scores well on information-preservation, but poorly on implementation-ease. This is because you can't write a default (library) implementation of
$unquote$
as a function or
macro in a way that splicesthe expressions into the
$construct$:foo
invocation.
We could implement $unquote$
as an identity function,
if there was a separate $unquote$
for each expression:
($construct$:foo "s" ($unquote$ exp1) ($unquote$ exp2) "t")However, this does lose information about how many
&[...]
-delimiters there were - which might (in rare situations) matter.
Also, we would need some convention to distinquish prefix arguments
from other enclosed expressions.
Because we use the same escape prefix &
as in
SRFI-109 (extended string quasi-literals) it is make sense to allow the same convenience features:
{
);

);
&-
);
&|
);
&#|comment|#
); and
&~,2f[balance-due]
).
The reader creates $construct$:cname
invocations,
so the application or library programmer must provide a
definition of $construct$:cname
.
It seems useful to provide some utilility functions or syntax
to simplify these. As a start, this specification proposes:
(define-simple-constructor cname cname-maker [str-maker])This has the effect that:
&cname[init-arg ...]{text}after being read as:
($construct$:cname [init-arg ... $>>$] text-arg ...)gets evaluated as:
(cname-maker init-arg ... (str-maker text-arg ...))
The default for str-maker
is $string$
,
as specified in SRFI-109.
This combines
all the non-prefix arguments and treat them as a string quasi-literal.
That is makes it easy to implement:
&cname[init-exp ...]{abc&[infix-exp1]def&[infix-exp2]...xyz}as if it were a call to some specified
cname-maker
function thus:
(cname-maker init-exp ... &{abc&[infix-exp1]def&[infix-exp2]...xyz})
This section discusses some ideas that seem worthwhile, but need more thought, so are deferred for now.
In addition to those mentioned below, consider also special characters, formatting, and user-defined end token from SRFI-109.
A possible extension is to support SRFI-10 style read-time literals in certain restricted cases, when all the expressions are literal, and the transformers are available to the reader. This should probably not be the default (for consistency and because of security concerns), but could be supported in an implementation that has programmable read-tables.
(define args (list e1 e1 ... en)) &foo[@args]The reader could convert the latter to:
($construct$:foo ($splice$ args) $>>$)Assuming
$construct$:foo
is bound to a make-foo
function,
we want this to be equivalent to:
(apply make-foo args)
Expecting each $construct$:foo
implementation to desugar the $splice$
forms is unfriendly,
but it could be handled by define-simple-constructor
.
This seems easy enough when the implementation rewrites to a function call,
since we can handle the splicing by writing to an apply
call.
It gets trickier when macros are involved.
Handling splicing seems cleaner if the Scheme compiler handles splicing natively - i.e. as a general feature of function application. This seems worth exploring, but is obviously beyond the scope of this SRFI.
&
as
marker/delimiter character.
Alternative marker characters were also considered, and this mostly-historical
section explains why we chose &
.
The discussion also considers SRFI-109 (extended string quasi-literals), and refers to
the non-terminals defined in the syntax specifications
of both SRFI-108 and SRFI-109.
First let us focus on the escape character in named-literal-part and string-literal-part. After that we will look at characters to use to indicate the start of a top-level extended-string-literal and extended-datum-literal.
Different or same escape characters in literal-text?
There are multiple different escape character roles:
first we have escapes in string-literal-part.
Then in a named-literal-part we have escaped
strings and characters (same as in string-literal-part),
plus we have nested extended-datum-body.
For the latter we prefer a single escape character for both uses,
to avoid a proliferation of escape characters.
Also, for consistency it seems better to use the same
escape character and syntax for string and character
escapes in both string-literal-part
and named-literal-part.
The conclusion seems to be we should use the same escape character in all
roles (at least within a literal-part).
As to which character to use,
the most plausible choices seem to be &
,
@
, or \
.
Use &
as escape character:
Using &
is compatible with
XML, HTML, SGML, and also "XML literals" embedded in programming
languages, including SRFI-107 (XML reader syntax).
Use \
as escape character:
Using \
is of course
compatible with standard Scheme string literals. Backslash has also
been used for as an escape in many languages, for string literals,
regular expressions, shells, TeX, and more.
If using \
as an escape for
SRFI-109 strings, it would be tempting to enhance standard
string literals with some of the same features, such as enclosed
expressions. However, traditional C-style single-letter escapes,
such as \n
cause a problem:
You either don't allow them in the literal-part of this specification
(in which case the latter is not a super-set of standard string escapes),
or you need some non-letter prefix character
in front of a cname, which is tedious.
Use @
as escape character:
Using @
as the escape character
goes back to Scribe, TexInfo, and Scribble. These are all markup
languages, not programming languages. However, Scribble allows nested
Racket Scheme expression, and
(if you select the at-exp
Racket parser)
you can also nest Scribble nested in a top-level Scheme program.
Braces vs brackets:
The specification uses {
curly braces}
for quoted (literal) text, and uses [
square brackes]
to delimit unquoted expressions.
This is compatible
with Scribble;
BRL's use of square
brackets; Tcl's use
of brackets and braces.
On the other hand, JavaFX Script used {
curly braces}
for
escaped expressions. So did Kawa's
XML literals.
(However Kawa XML literals can support both brackets as well
as braces as a depecated alternative.)
Use braces only: Another option is instead of a single escape character we just use brackets to enclose expressions, without a prefix character, as in:
&{Here is the average: [(/ sum count)].}Special characters can be expressed using standard Scheme character or string literals. It is not clear how one would handle a nested extended-datum-body. Special features, like format specifier, and line-paste escapes are also difficult to express.
Use implicit concatenation instead of enclosed expressions: Finally, it is possible to not have any support for expression escapes, but instead have a more compact format for concatenation. For example a string literal right next to an expression, with no space in between, could be defined as concatenation. Thus:
"Here is the average: "(/ sum count)"."This is pretty fragile (in terms of unintended whitespace for example) though using different start and end string delimiters (for example square brackets) helps:
{Here is the average: }(/ sum count){.}
Single character to start quasi-literals: Next, when it comes to the the Scheme expression level, we need an unambiguous character or sequence of character to mark the start of a quasi-literal. If we use a single character, it makes sense for that character to match the literal-part escape character, since it easies nested named-literal-part forms.
Using \
as the start character
does not appear to conflict with (draft-)R7RS, but it would be
a conflict for many Scheme implementations that use
\
as a single-escape character
as in Common Lisp.
Using @
as the start character
does not seem to conflict with standard Scheme, because
it is not a valid identifier-start character. However, it
might conflict with implementation extensions.
(For example Kawa uses @
to name
Java-style annotations.)
Using &
as the start character
may cause compatibility problems, since &
is a valid <initial> character in standard Scheme, thus
it might be difficult to disambiguate from an identifier.
Some R6RS-based naming conventions use such names for
record types or exception types.
The sequence &
followed by a name followed by brackets or braces is effectively
non-conflicting: In a Scheme that defines brackets as equivalent
to parentheses, the following is techically well-defined:
&name[form1 form2]as it could be read as two datum items:
&name (form1 form2)
If such as Scheme were to implement this SRFI, it would change that reading to:
($construct$:name $<<$ form1 form2 $>>$)That is why this specification requires a braces-delimited named-literal-part, even when the latter is empty.
Starting quasi-literals with # and a dispatch character:
Starting quasi-literals with #\
conflicts with character literals.
Neither #&
or
#@
appear problematic.
However, starting a string literal such as #&{text}
with 3 delimiter characters is rather ugly and easily mistyped.
expression ::= ... | extended-datum-literal
extended-datum-literal ::= extended-datum-body extended-datum-body ::=&
cname{
initial-ignored? named-literal-part*}
|&
cname[
expression*]{
initial-ignored? named-literal-part*}
cname ::= tagname
An implementation may allow leaving out the braces if empty, i.e.:
extended-datum-body ::= ... as above ... |&
cname[
expression*]
However, note that accordingly to R6RS &foo[abc]
should
be read as the symbol &foo
followed by a
list [abc]
- i.e. as if it were
&foo (abc)
.
Implementations may handle this ambiguity differently,
so portable programs should not leave out the empty braces.
For the definition and discussion of tagname see SRFI-109 (tagname).
The non-terminal named-literal-part is the same as string-literal-part in SRFI-109 (extended string quasi-literals), except for the support for a nested extended-datum-body.
named-literal-part ::= any character except&
,{
or}
|{
named-literal-part+}
| char-ref | entity-ref | special-escape | enclosed-part | extended-datum-body
The remaining non-terminals match those of SRFI-109 (extended string quasi-literals).
initial-ignored ::= intraline-whitespace line-ending intraline-whitespace&|
special-escape ::= intraline-whitespace&|
|&
nested-comment |&-
intraline-linespace line-ending char-ref ::=&#
digit+;
|&#x
hex-digit+;
entity-ref ::=&
char-or-entity-name;
opt-format-specifier ::= empty |~
format-specifier-after-tilde |%
format-specifier-after-percent enclosed-part ::=&
enclosed-modifier[
expression*]
|&
enclosed-modifier(
expression+)
An enclosed-modifier is normally empty, but implementations may support extensions (for example format specifiers); see discussion in SRFI-109.
enclosed-modifier ::= empty
The general form:
&name[exp1 ... expN]{part1...partM}is translated by the reader to:
($construct$:name exp1 ... expN $>>$ tpart1 ... tpartM)More precisely:
Tr[&
name[
expression*]{
initial-ignored? content-piece*}
] ⟾($construct$:
name expression*$>>$
TrContent[content-piece]*)
Tr[&
name{
initial-ignored? content-piece*}
] ⟾($construct$:
name TrContent[content-piece]*)
TrContent
is as in SRFI-109, except we add this rule:
TrContent[extended-datum-body] ⟾ Tr[extended-datum-body]
(define-simple-constructor cname cname-maker [str-maker])The default for
str-maker
is $string$
,
as specified in
SRFI-109.
This provides a syntax binding for $construct$:cname
such that
($construct$:cname [init-arg ... $>>$] text-arg ...)gets evaluated as:
(cname-maker init-arg ... (str-maker text-arg ...))
$<<$
and $>>$
are bound to unique zero-length strings, as in SRFI-109.
Since this specification changes the reader format, and there is no standard Scheme way to do that, there is no portable implementation. However, this specification is being implemented in Kawa. (Check out the development version using Subversion.)
Copyright (C) Per Bothner 2013
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.