Per Bothner
<per@bothner.com>
This SRFI is currently in ``draft'' status. To see an explanation of
each status that a SRFI can hold, see here.
To provide input on this SRFI, please
mail to
<srfi minus 108 at srfi dot schemers dot org>
. See
instructions here to
subscribe to the list. You can access previous messages via
the archive of the mailing list.
This specifies an extensible reader syntax for
named value constructor.
A reader prefix is followed by a tag
(a symbol),
and then expressions and literal text parameters.
The tag can be though of as a class name, and the expression and
literal text are arguments to an object constructor call.
The reader translates the form to a list
whose function
is $quasi-value$
, which
is normally bound to a predefined macro.
This propsal is related to SRFI-109 (extended string quasi-literals) and SRFI-107 (XML reader syntax), as they share quite a bit of syntax.
Note the section Discussion: Delimiter options
discusses alternative delimiter characters.
The syntax examples show two plausible syntax choices:
What I call xml-style
in red,
or scribble-style
in green.
When adding new datatypes it is useful to add new literals of that type, or at least a compact readable notation for creating instances. SRFI-10 provided one solution. Here is an example, which assumes a URI type for representing encoded Uniform Resource Identifiers (URIs - or generalized URLs):
#,(URI "http://example.com/")
SRFI-10 has a number of problems. One issue
is that SRFI-10 conflicts with syntax-case
and R6RS.
More fundamentallty SRFI-10 resolves the tag
name to a
constructor function at read type, which requires managing those
names using a distinct mechanism. It seems better to use normal
scope rules to manage this mapping.
The reader is also responsible for calling the constructor,
which means the format handled by read
is extensible,
which is good, but you have to be careful about the security
implications, making sure only safe
constructor functions are called.
Finally, SRFI-10 doesn't integrate with quasi-quotation,
and some of us find its syntax a bit ugly.
Instead, this SRFI proposes:
#&URI[http://example.com/]
or (if the consensus ends up on Scribble-style syntax):
@URI{http://example.com/}The Scheme reader translates this to:
($quasi-value$ URI "http://example.com/")
This expression will normally (i.e. if not quoted)
invoke the predefined macro $quasi-value$
which expands to a suitable constructor invocation
,
as specified below.
As related prior art, Caml p4 has an interesting quotation system.
(define example-host "example.com") #&URI[http://&{example-host}/](Note Scribble-style switches braces and brackets, as discussed later.)
(define example-host "example.com") @URI{http://@[example-host]/}
Node the use of &
for two different purposes: The top-level quasi-literal,
and before the unquoted expression. This will be motivated later.
The above example is read as:
($quasi-value$ URI "http://" example-host "/")
Note that enclosed are commonly strings but not always. This example executes an SQL query with a numeric parameter:
#&sql[select * from employees where salary > &{min-salary}]
@sql{select * from employees where salary > @[min-salary]}Even when the parameter is a string, simple string pasting may be wrong - or dangerous: Consider:
#&sql[select * from employees where name = '&{my-name}']
@sql{select * from employees where name = '@[my-name]'}
Consider what happens if name is constructed by a malicious
user and has the value: smith' or ''='
. In that case the effective query
would be:
#&sql[select * from employees where name = 'smith' or ''='']
@sql{select * from employees where name = 'smith' or ''=''}
This would retrieve all employees
.
This specification distinguishes literal text from that
from an evaluated expression, so an SQL library can do the
necessary quoting of special characters.
An example with enclosed expressions:
#&cname[text1&{exp1}text2&{exp2}text3]
@cname{text1@[exp1]text2@[exp2]text3}is translated by the reader to:
($quasi-value$ cname "text1" exp1 "text2" exp2 "text3")
#&cname[&{exp1 exp2}text]
@cname{@[exp1 exp2]text}you can write:
#&cname{exp1 exp2}[text]
@cname[exp1 exp2]{text}
These are almost the same, except that the former is read as
($quasi-value$ cname "" exp1 exp2 "text")while the latter is read as:
($quasi-value$ cname exp1 exp2 "text")i.e. without the extra empty string argument. It is most common that both expression will usually evaluate to the same value, but not required.
The initial expression can be especially useful for initial keyword parameters. EXAMPLE NEEDED.
@cmd[datum ...]{text-body}This SRFI (if using XML-style syntax) switches the roles of
{}
and []
,
and uses &
instead of @
, to
be compatible with XML-literals, and also because
{}
is more commonly used for anti-quotation.
Markup is commonly nested, which suggests that a
&
in text can be used
as an abbreviated extended-datum-literal.
Specifically:
#&name1{exps}[abc&name2{exps}[klm]xyz]
@name1[exps]{abc@name2[exps]{klm}xyz}is syntatic sugar for:
#&name1{exps}[abc&{#&name2{exps}[klm]}xyz]
@name1[exps]{abc@[@name2[exps]{klm}]xyz}
This nesting of markup justifies why we use the same escape character for both top-level and enclosed forms.
The use of the same escape prefix &
as in
SRFI-109 (extended string quasi-literals) suggests allowing the same convenience features,
including character escapes and indentation handling.
See the syntax specification below for details, and
see
SRFI-109
for examples and motivation.
The reader creates an $quasi-value$
invocation,
but we need to define what this invocation does. We could
refrain here from specifying $quasi-value$
except to require that
it be in scope when used. However, that seems unfriendly.
There should at least be a default definition. Consider
#&cname{pre-exp ...}[abc&{infix-exp1}def&{infix-exp2}...xyz]
@cname[pre-exp ...]{abc@[infix-exp1]def@[infix-exp2]...xyz}which is read as:
($quasi-value$ cname pre-exp ... "abc" infix-exp1 "def" infix-exp2 ...xyz)
(cname pre-exp ... #&[abc&{infix-exp1}def&{infix-exp2}...xyz])
How does the implementation of $quasi-value$
value know
what forms are in the square brackets? By searching for the first
string literal among the macro arguments; that arguments and any remaining
are evaluating as a string.
Note a complication that one of the pre-exp
or infix-exp
arguments might be a literal string. To disambiguate this case, as a special
case the reader wraps those literals in a quote
form:
#&cname{"A" 123 "B" 456 }[abc&{"K"}xyz]
@cname["A" 123 "B" 456 ]{abc@["K"]xyz}is read as:
($quasi-value$ cname (quote "A") 123 (quote "B") 456 "abc" (quote "K") "xyz")
This implementation seems OK as a default, but in many cases
it will not do the right thing. For example one common Scheme convention
is to use make-cname
as the name of the procedure for
constructing cname
objects. You might want to
re-arrange the arguments. You might want the enclosed expressions to
not be string-ified. So we need a mechanism for programmers to
write custom expansions of $quasi-value$
. The problem
is if you have multiple libraries try to define custom expansions of
$quasi-value$
. We want these to co-exist without clashing.
A solution is that for a given cname
look for the a binding for a special name
$quasi-value-transformer$:cname
.
I.e. the standard building for $quasi-value$
will first look for this binding before doing the above default action.
($quasi-value$ cname form ...)will map to:
($quasi-value-transformer$:cname form ...)
Using a compound name
with a colon should hopefully be
portable both to Scheme implementations where colon is a regular
identifier constituent, and to Scheme implementations where colon
is used for Common Lisp-style prefixed symbols.
This search mechanism may require some implementation-specific hacks,
since $quasi-value$ has to first looks for
$quasi-value-transformer$:cname
and then falls back
to a function cname
.
There may not be a portable way to write this with R6RS syntax-case,
though implementations can provide a way to check if there is a lexical binding.
A possible extension is to support SRFI-10 style read-time literals in certain restricted cases, when all the expressions are literal, and the transformers are available to the reader. This should probably not be the default (for consistency and because of security concerns), but could be supported in an implementation that has programmable read-tables.
There are various choices for delimiter characters in place of &
, and in this section we'll discuss some possibilities.
It is reasonable to consider both SRFI-108 (this specifiction) and
SRFI-109 (extended string quasi-literals)in conjunction with each other.
(This discussion refers to the non-terminals defined
in the syntax specifications of both SRFI-108 and SRFI-109.)
First let us focus on the escape character in literal-text. After that we will look at characters to use to indicate the start of a top-level extended-string-literal and extended-datum-literal.
Different or same escape characters in literal-text?
There are multiple different escape character roles:
first we have escapes in string-literal-part.
Then in a named-literal-part we have escaped
strings and characters (same as in string-literal-part),
plus we have nested extended-datum-body.
We presumably want to use a single escape character
for all the roles in a named-literal-part,
so avoid a proliferation of escape characters.
Also, for consistency it seems better to use the same
escape character and syntax for string and character
escapes in both string-literal-part
and named-literal-part.
The conclusion seems to be we should use the same escape character in all
roles (at least within a literal-part).
As to which charater to use,
the most plausible choices seem to be &
,
@
, or \
.
Use &
as escape character:
Using &
is compatible with
XML, HTML, SGML, and also "XML literals" embedded in programming
languages, including SRFI-107 (XML reader syntax).
Use \
as escape character:
Using \
is of course
compatible with standard Scheme string literals. Backslash has also
been used for as an escape in many languages, for string literals,
regular expressions, shells, TeX, and more.
If using \
as an escape for
SRFI-109 strings, it would be tempting to enhance standard
string literals with some of the same features, such as enclosed
expressions. However, traditional C-style single-letter escapes,
such as \n
cause a problem:
You either don't allow them in the literal-part of this specification
(in which case the latter is not a super-set of standard string escapes),
or you need some non-letter prefix character
in front of a cname, which is tedious.
Use @
as escape character:
Using @
as the escape character
goes back to Scribe, TexInfo, and Scribble. These are all markup
languages, not programming languages. However, Scribble allows nested
Racket Scheme expression, and
(if you select the at-exp
Racket parser)
you can also nest Scribble nested in a top-level Scheme program.
If we use @
as the escape character
then we might want to switch square and curly braces for
better Scribble compatibility.
Braces vs brackets:
For extended-string-literal should we use
{
curly braces}
or [
square brackets]
?
I personally think curly seem nicer and perhaps more common,
but I might be biased by my experience with
JavaFX Script.
Furthemore, curly braces are used in
SRFI-107 (XML reader syntax),
which is already more-or-less implemented in Kawa.
On the other hand, Scribble uses square brackets.
Whichever one of brackets or braces is used for unquoted expressions, the other one should be used to enclose a literal-part.
Use braces only: Another option is instead of a single escape character we use some kind of brackets to enclose expressions, as in:
#&[Here is the average: {(/ sum count)}.]Special characters can be expressed using standard Scheme character or string literals. It is not clear how one would handle a nested extended-datum-body. Special features, like format specifier, and line-paste escapes are also difficult to express.
Use implicit concatenation instead of enclosed expressions: Finally, it is possible to not have any support for expression escapes, but instead have a more compact format for concatenation. For example a string literal right next to an expression, with no space in between, could be defined as concatenation. Thus:
"Here is the average: "(/ sum count)"."This is pretty fragile (in terms of unintended whitespace for example) though using different start and end string delimiters (for example square brackets) helps:
[Here is the average: ](/ sum count)[.]
Single character to start quasi-literals:
Next, when it comes to the the Scheme expression level,
we need an unambiguous
character or sequence of character to mark the start of a quasi-literal.
If we use a single character, it makes sense for that character
to match the literal-part escape character,
since it easies nested named-literal-part forms.
Using &
as the start character
may cause compatibility problems, since &
is a valid <initial> character in standard Scheme, thus
it might be difficult to disambiguate from an identifier.
However, starting an identifier with &
is likely to be rare.
Using \
as the start character
does not appear to conflict with (draft-)R7RS, but it would be
a conflict for many Scheme implementations that use
\
as a single-escape character
as in Common Lisp.
Using @
as the start character
does not seem to conflict with standard Scheme, because
it is not a valid identifier-start character. However, it
might conflict with implementation extensions.
(For example Kawa uses @
to name
Java-style annotations.)
Starting quasi-literals with # and a dispatch character:
Starting quasi-literals with #\
conflicts with character literals.
Neither #&
or
#@
appear problematic.
Recommendation: Either:
XML style, as in red examples, and most of the write-up; or:
Scribble style, as in green examples. However, note that while I call this
Scribble stylewhat I recommend is only Scribble-compatible to a limited extent. Most importantly:
@{text}is (in this family of proposals) used for string quasi-literals, and evaluates to a string. I think this is a much more useful result than in Scribble. Also, quoted forms yield different results that in Scribble.
Using @
is escape character still leaves open
when to uses braces and when to use brackets.
I would prefer to use curly braces for enclosed expressions and
square brackets for literal-parts, mainly because it is compatible
with SRFI-107 (XML reader syntax) as currently implemented in Kawa.
However, Scibble uses it the other way round - but
extended-datum-literal will be only approximately
compatible with Scribble anyway, so that shouldn't be deciding.
In any case the Scribble-style example code in these SRFIs does use brackets and braces as in Scribble, if nothing else to show how it looks.
expression ::= ... | extended-datum-literal
extended-datum-literal ::=The non-terminal named-literal-part is the same as string-literal-part in SRFI-109 (extended string quasi-literals), except for the support for a nested extended-datum-body.#
extended-datum-body extended-datum-body ::=&
cname[
named-literal-part...]
|&
cname{
expression...}
|&
cname{
expression...}[
named-literal-part...]
cname ::= identifier
named-literal-part ::= any character except&
,[
or]
|[
named-literal-part...]
| char-or-entity-ref | special-escape |&
enclosed-part | extended-datum-body
The remaining non-terminals match those of SRFI-109 (extended string quasi-literals).
special-escape ::= ignored-whitespace&|
| TBD (at least line pasting and comments) char-or-entity-ref ::=&
char-or-entity-name;
|&#
digits;
|&#x
hex-digits;
opt-format-specifier ::= empty |~
format-specifier-after-tilde |%
format-specifier-after-percent enclosed-part ::=&
opt-format-specifier{
expression ...}
|&
opt-format-specifier(
expression...)
This is an alternative syntax, if we decide to use Scribble-style
@
escapes.
This is not a true extension/superset of
Racket's at-exp mode, but it is the same in most cases.
Non-compatible cases include @{foo bar}
which Scribble reads as ("foo bar")
but I suggest reading as a string "foo bar"
.
expression ::= ... | extended-datum-literal
(In this case, extended-datum-literal
and extended-datum-body have the
same syntax, so this could be simplified: An @
-form
is allowed both as an initial character in Scheme forms,
and as an escape character in literal parts.)
extended-datum-literal ::= extended-datum-body extended-datum-body ::=The non-terminal named-literal-part is the same as string-literal-part in SRFI-109 (extended string quasi-literals), except for the support for a nested extended-datum-body.@
cname{
named-literal-part...}
|@
cname[
expression...]
|@
cname[
expression...]{
named-literal-part...}
cname ::= identifier
named-literal-part ::= any character except@
,{
or}
|{
named-literal-part...}
| char-or-entity-ref | special-escape |@
enclosed-part | extended-datum-body
The remaining non-terminals match those of SRFI-109 (extended string quasi-literals).
special-escape ::= ignored-whitespace@|
| TBD (at least line pasting and comments) char-or-entity-ref ::=@
char-or-entity-name;
;;should probably change this |@#
digits;
;;should probably change this |@#x
hex-digits;
opt-format-specifier ::= empty |~
format-specifier-after-tilde |%
format-specifier-after-percent enclosed-part ::=@
opt-format-specifier[
expression ...]
|@
opt-format-specifier(
expression...)
The general form:
#&name{exp1 ... expN}[part1...partM]
@name[exp1 ... expN]{part1...partM}is translated by the reader to:
($quasi-value$ name exp1 ... exp2 tpart1 ... tpartM)More precisely:
Tr[#&
name{
expression...}[
content-piece...]
] ⟾($quasi-value$
name expression... TrContent[content-piece]...)
Tr[#&
name[
content-piece...]
] ⟾($quasi-value$
name TrContent[content-piece]...)
TrContent[&
name{
expression...}[
content-piece...]
] ⟾($quasi-value$
name expression... TrContent[content-piece]...)
TrContent[&
name[
content-piece...]
] ⟾($quasi-value$
name TrContent[content-piece]...)
$quasi-value$
,
even if not fully portable.
Copyright (C) Per Bothner 2012
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.