by Marc Nieper-Wißkirchen
This SRFI is currently in final status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-241@nospamsrfi.schemers.org
. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.
This SRFI describes a simple pattern matcher based on one originally devised by Kent Dybvig, Dan Friedman, and Eric Hilsdale, which has a catamorphism feature to perform recursion automatically.
Pattern matching to destructure expressions is a common task in data-oriented programming. Consequently, a number of pattern-matching libraries for Scheme have found more widespread use. Among them are the three mentioned in the (withdrawn) SRFI 200, namely Racket's pattern matcher, the pattern matcher of Andrew Wright and Robert Cartwright with extensions by Alex Shinn, and a pattern matcher that has been described by R. Kent Dybvig for the use with his Chez Scheme implementation.
The last-mentioned matcher, which is appealing also from a
theoretical point of view, is finally the basis of the matcher
described in this SRFI. Contrary to the other two matchers, it
has a simple pattern language. Its strength lies in its
catamorphism feature to perform recursion automatically. It is
called match
.
While one use of pattern matching is to bind variables, it can
also be used to recursively destructure a value in the sense of
R^{6}RS's
or SRFI
1's fold-right
. In high-brow category
language, such procedures destructuring inductive values are
called catamorphisms.
In some sense, match
syntactically expresses the
general catamorphism on Scheme datums. For example, the
aforementioned fold-right
procedure (restricted to
a single list argument) can be implemented as follows:
(define (fold-right kons knil lis) (match lis [(,x . ,[x*]) (kons x x*)] [() knil]))
The syntax employed by the pattern matcher by Dybvig, Friedman, and Hilsdale, on which the one described here is based, is also the basis of the syntax of The Nanopass Framework.
A match
expression evaluates an input expression
producing an input value. This value is then matched against
the patterns of a match
expression much as the
input of a case
expression is matched against
datums.
The general form of a match expression is
(match ⟨ ⟩ ⟨ ⟩ …)
where ⟨
is the input expression to match and
each ⟩⟨
has one of the following two forms: ⟩
[⟨guard
⟨ ⟩ …) ⟨ ⟩]
⟩ (
[⟨
⟩ ⟨ ⟩]
As with case
, the input expression is evaluated to
produce the input value and the first clause that the input value
matches, if any, is selected. The body
of the
selected clause is evaluated, and the values of the last
expression in the body are returned. An input value matches a
clause if it fits the clause's pattern and the
⟨
, if any, evaluate to a
true value. Patterns may contain symbolic constants, which must
match exactly, and pattern variables, which match any
input. Pattern variables are prefixed by commas; symbolic
constants are not.
⟩
Note that in R^{6}RS matching brackets may be used in place of matching parentheses and vice versa, mostly for readability. We use brackets in this specification to document a recommended style. Users of Scheme systems not supporting brackets (and users who don't like brackets) can replace them with parentheses throughout.
(match '(a 17 37)
[(a ,x) 1]
[(b ,x ,y) 2]
[(a ,x ,y) 3]) ⟹ 3
The first clause fails to match because there are three items
in the input list, and the pattern has only two. The second
clause fails because b
does not match a
.
In the output expressions, the values of the pattern variables are bound to the corresponding pieces of input.
(match '(a 17 37)
[(a ,x) (- x)]
[(b ,x ,y) (+ x y)]
[(a ,x ,y) (* x y)]) ⟹ 629
When followed by an ellipsis (...
), a pattern
variable represents a sequence of input values.
(match '(a 17 37) [(a ,x* ...) x*]) ⟹ (17 37)
Ellipses can follow a structured pattern containing one or more pattern variables.
(match '(begin (1 5) (2 6) (3 7) (4 8)) [(begin (,x* ,y*) ...) (append x* y*)]) ⟹ (1 2 3 4 5 6 7 8)
Ellipses can be nested, producing sequences of sequences of values.
(match '((a b c d) (e f g) (h i) (j)) [((,x* ,y** ...) ...) (list x* y**)]) ⟹ ((a e h j) ((b c d) (f g) (i) ()))
Recursion is frequently required while processing an input
value with match
. Here, a procedure returning the
length of a list is defined.
(letrec ([len (lambda (lst) (match lst [() 0] [(,x ,x* ...) (+ 1 (len x*))]))]) (len '(a b c d))) ⟹ 4
A simpler version of the above uses the catamorphism feature of
match
. If a pattern variable is written as
,[var]
, match
recurs on the
matching subpart of the input before evaluating the output
expressions of the clause.
(let ([len (lambda (lst) (match lst [() 0] [(,x . ,[y]) (+ 1 y)]))]) (len '(a b c d))) ⟹ 4
In some cases, match
will need to return multiple
values. The catamorphism syntax can be used to receive multiple
values. When making implicit recursive calls using the
catamorphism syntax, zero or more variables between the
parentheses can be included, each representing one of the
expected return values.
(let ([split (lambda (lis) (match lis [() (values '() '())] [(,x) (values `(,x) '())] [(,x ,y . ,[odds evens]) (values `(,x . ,odds) `(,y . ,evens))]))]) (split '(a b c d e f))) ⟹ (a c e) (b d f)
This example is equivalent to the following one:
(let ([split (lambda (lis) (match lis [() (values '() '())] [(,x) (values `(,x) '())] [(,x ,y . ,[split -> odds evens]) (values `(,x . ,odds) `(,y . ,evens))]))]) (split '(a b c d e f))) ⟹ (a c e) (b d f)
In this case, the operator to which recursive calls are being
made is explicitly named in the catamorphism subpattern, which
is ,[split -> odds even]
in this case.
In general, ,[⟨
recurs to the top of the current
match while ⟩
…],[⟨-> ⟨ ⟩
…]
recurs to (the result of
evaluating) ⟩ ⟨
. ⟩⟨
must evaluate to a procedure that accepts one argument, the
matched value, and returns as many values as there are
identifiers following the ⟩->
.
Explicitly naming a catamorphism operator is useful when a parser has to defer parsing of a sub-input to a different parser or when special parsing parameters have to be active when parsing a sub-input.
Here is an example that illustrates the use of guards and how to achieve the effect of a catch-all clause.
(let ([simple-eval (lambda (x) (match x [,i (guard (integer? i)) i] [(+ ,[x*] ...) (apply + x*)] [(* ,[x*] ...) (apply * x*)] [(- ,[x] ,[y]) (- x y)] [(/ ,[x] ,[y]) (/ x y)] [,x (assertion-violation 'simple-eval "invalid expression" x))))]) (simple-eval '(+ (- 0 1) (+ 2 3)))) ⟹ 4
The match
form extends quasiquote
in
the ⟨
to an ellipsis-aware version
that allows ellipses to be used in place
of ⟩unquote-splicing
, which often leads to more
readable code. Consider the following transformer
of let
forms in a Scheme-like language:
(define translate (lambda (x) (match x [(let ([,var* ,expr*] ...) ,body ,body* ...) `((lambda ,var* ,body ,@body*) ,@expr*)] [,x (assertion-violation 'translate "invalid expression" x)])))
This can be rewritten as follows:
(define translate (lambda (x) (match x [(let ((,var* ,expr*) ...) ,body ,body* ...) `((lambda ,var* ,body ,body* ...) ,expr* ...)] [,x (assertion-violation 'translate "invalid expression" x)])))
The better readability is, in particular, illustrated by
the following procedure with nested ellipses. It converts
unnamed let
expressions into direct lambda
applications, where the let
has been generalized to
allow an implicit begin
in each right-hand-side
expression.
(define (f x) (match x [(let ([,x ,e1 ...] ...) ,b1 ,b2 ...) `((lambda (,x ...) ,b1 ,b2 ...) (begin ,e1 ...) ...)]))
The basic usage of (the ellipsis-aware) quasiquote
is the same as in R^{6}RS:
`(list ,(+ 1 2) 4) ⟹ (list 3 4)
Lists can still be spliced into sequences.
`(a ,(+ 1 2) ,@(map abs '(4 -5 6)) b) ⟹ (a 3 4 5 6 b)
The extension to quasiquote
allows ellipses to be
used in place of unquote-splicing
(,@
)
to piece together the output form.
`(a ,(+ 1 2) ,(map abs '(4 -5 6)) ... b) ⟹ (a 3 4 5 6 b)
Within each subform followed by an ellipsis, each comma-prefixed item must be a list and all such items within the same subform must have the same length.
`((,'(1 2 3) . ,'(a b c)) ...) ⟹ ((1 . a) (2 . b) (3 . c))
A subform followed by an ellipsis may be contained within a larger subform that is also followed by an ellipsis. In this case, each comma-prefixed item must be a list of lists, each such item must have the same length, and the corresponding sublists of each such item must have the same lengths. This requirement generalizes to comma-prefixed items nested within more than two levels of ellipsis-followed subforms.
`(((a ,'((x 1) (x 2) (x 3))) ...) ...) ⟹ (((a x) (a 1)) ((a x) (a 2)) ((a x) (a 3)))
In the output, a subform may be followed directly by two or more ellipses; the requirements are the same as for nested ellipses, but the result is flattened rather than nested.
`((a ,'((x 1) (x 2) (x 3))) ... ...) ⟹ ((a x) (a 1) (a x) (a 2) (a x) (a 3))
Ellipses can also follow subforms containing items prefixed by unquote-splicing
(,@
).
`((a ,@'((x 1) (x 2) (x 3))) ...) ⟹ ((a x 1) (a x 2) (a x 3))
A subform of the form (... ⟨ ⟩)
is
identical to ⟨
, except that ellipses
in the subform have no special meaning. ⟩
`(... (,'(1 2 3) ...)) ⟹ ((1 2 3) ...)
Substitutions are made only for unquoted components appearing at the same nesting level as the outermost quasiquote.
`(a `(b ,(list 1 2) ... ,(foo ,(list 1 3) ... d) e) f) ⟹ (a `(b ,(list 1 2) ... ,(foo 1 3 d) e) f)
Both the (withdrawn) SRFI 204 and this SRFI describe a pattern matcher and suggest that the respective one becomes included in Scheme implementations and a future Scheme standard. Yet they are not in competition with each other. Quite the contrary, it makes a lot of sense that a Scheme system supports both matchers as they apply to different areas.
While a number of basic tasks can be solved equally well with the matcher described in SRFI 204 and the matcher described here, some problems are better solved with one or the other. The matcher of SRFI 204 mirrors most of Scheme's binding constructs. For these binding constructs, the catamorphism feature in this SRFI does not make sense, so this SRFI's matcher lacks it. On the other hand, the matcher described here distinguishes itself due to its catamorphism feature but, therefore, cannot support binding constructs.
SRFI 204 has a complex pattern language and allows one to write complex matchers with just a single pattern. If this is not needed, the pattern language of this SRFI excels due to its simplicity and exceptionally clear semantics.
For historical reasons, both the matcher described here and in
SRFI 204 are called match
. This specification does
not change the established name. Fortunately, when reading
code, there is virtually no danger of confusion. Pattern
variables in the matcher described here are all unquoted and
there is no quasiquote, while in patterns of SRFI 204's
match
,
unquoted datums usually only appear in quasiquotations.
If more than one matcher is needed in a program, one can easily use the rename facility of the Scheme implementation's module system.
Remark: The author of this SRFI suggests that any reviving attempt of SRFI 204 should make the resulting pattern matcher extensible so that there are only few primitives with clear semantics allowing the rest of the specification to be defined in terms of them.
Even better than reviving SRFI 204 would be specifying a facility to define type-safe matchers as in the Nanopass framework.
The match form uses a pattern language akin
to syntax-rules
and syntax-case
, which
is described here.
A ⟨
has one of the following forms: ⟩
(⟨
⟩ ⟨ ⟩ . ⟨ ⟩)
(⟨
⟩ . ⟨ ⟩)
()
#(⟨
⟩ …)
#(⟨
⟩ … ⟨ ⟩ ⟨ ⟩ ⟨ ⟩ …)
,_
,⟨
⟩
⟨
⟩
⟨
⟩
⟨
⟩
where a ⟨
has one of the following forms: ⟩
,[⟨-> ⟨ ⟩ …]
⟩
,[⟨
⟩ …]
and where the ⟨
is the
ellipsis ⟩...
(in the sense
of free-identifier=?
), ⟨
is an expression, and ⟩⟨
and ⟩⟨
are identifiers. ⟩
In each ⟨
,
the ⟩⟨
and ⟩⟨
must be pairwise
disjoint (in the sense
of ⟩bound-identifier=?
). ⟨
must not
be ⟩...
, _
, or unquote
(in
the sense of free-identifier=?)
.
Patterns match Scheme values by the following rules:
A pattern of the form (⟨
matches a
dotted list if ⟩
⟨ ⟩ . ⟨ ⟩)⟨
matches each proper list element
and ⟩⟨
matches the tail of the dotted list. ⟩
A pattern of the form (⟨
matches a
pair if ⟩
. ⟨ ⟩)⟨
matches the ⟩car
and ⟨
matches the ⟩cdr
of the pair.
A pattern of the form ()
matches the empty list.
A pattern of the
form #(⟨
matches a vector of as many elements as there
are ⟩
…)⟨
if
each element matches the
corresponding ⟨
.
⟩
A pattern of the form #(⟨
matches a vector of n or more elements whose first k elements match
⟩ …
⟨ ⟩ ⟨ ⟩
⟨ ⟩ ⟨ ⟩ …
⟨ ⟩)⟨
through ⟩⟨
,
whose next m−k elements each match
⟩⟨
, and whose remaining n−m elements match
⟩⟨
through ⟩⟨
.
⟩
A pattern of the form ,_
matches an arbitrary
Scheme value.
A pattern of the form ,⟨
matches an arbitrary Scheme value. ⟩
A ⟨
matches an arbitrary Scheme value. ⟩
A pattern of the form ⟨
matches a symbol
with the same name as ⟩⟨
. ⟩
A pattern of the form ⟨
match a value
that is equal (in the sense of ⟩equal?
) to the
literal datum ⟨
. ⟩
The following syntax, together with the auxiliary
syntax unquote
, unquote-splicing
,
...
, _
, guard
, and ->
, is
exported by the libraries (srfi :241 match)
and (srfi :241)
. The auxiliary
syntax unquote
, unquote-splicing
,
...
, and _
is identical to the
auxiliary syntax exported by (rnrs base)
,
and guard
is identical to the syntax exported
by (rnrs exceptions)
.
(match ⟨ ⟩ ⟨ ⟩ …)
Syntax: ⟨
can be any
expression. The ⟩⟨
take one of two forms, either
⟩
(⟨
⟩ ⟨ ⟩)
or
(⟨guard ⟨ ⟩ …) ⟨ ⟩)
⟩ (
where the ⟨
can be
any expressions. ⟩
Semantics: A match
expression is evaluated
as follows:
First, the ⟨
is
evaluated to yield an input value. ⟩
Then, the first ⟨
for which
the input value matches its ⟩⟨
is selected.
⟩
The environment in which
the match
expression is evaluated is extended
by binding the pattern variables of
the ⟨
to fresh locations, and
the values that were matched against these pattern variables
are stored in those locations.
⟩
If ⟨
are present in the selected clause,
they are evaluated in left-to-right order in the extended
environment until one evaluates to ⟩#f
. In this
case, the rest of the ⟨
is
skipped and pattern matching proceeds with the
next ⟩⟨
.
⟩
If
no ⟨
evaluates
to ⟩#f
, or if there is no ⟨
present, in unspecified order,
the ⟩⟨
of
the ⟩⟨
in
the ⟩⟨
of the
selected ⟩⟨
are evaluated in no
specific order in the extended environment to yield cata
procedures, which are then invoked on the value that was
matched against the ⟩⟨
.
The environment is extended a second time by binding
the ⟩⟨
of
these ⟩⟨
to fresh
locations, and the values resulting from invoking the cata
procedures are stored in left-to-right order in the
corresponding locations. ⟩
Finally, the twice-extended
environment is extended a third time by
binding quasiquote
to ellipsis-aware
quasiquotation syntax (see below),
the ⟨
of the
selected ⟩⟨
is evaluated in
this thrice-extended environment, and the resulting values
from its last expression are returned.
⟩
If a ⟨
clause is of the
second form, the missing ⟩⟨
defaults to an expression that evaluates to a procedure that, when
called with one argument, evaluates the ⟩match
expression with the input value replaced by the argument value in
tail position.
If no pattern of any clause is matched, an exception
of type &assertion-violation
is raised.
If the match
expression is in tail context,
the ⟨
are ⟩⟨
.
⟩
Note: A pattern variable in the sense of this SRFI is an
ordinary variable, not a pattern variable in the sense of
the syntax-case
system of R^{6}RS.
Remark: The pattern language makes use
of unquote
without a
corresponding quasiquote
. It has been argued
that this leads to an “unbalanced” quasiquotation
syntax (there is an unquote
without
a surrounding quasiquote
) and it has been suggested to add
the “implicit” quasiquote
explicitly so
that the first example
(define (fold-right kons knil lis) (match lis [(,x . ,[x*]) (kons x x*)] [() knil]))
would become:
(define (fold-right kons knil lis) (match lis [`(,x . ,[x*]) (kons x x*)] [`() knil]))
We don't subscribe to the view that there is an
“implicit” quasiquote
in
the match
syntax described in this SRFI. In
fact, quasiquote
is not auxiliary syntax
for match
, and if there were an
implicit quasiquote
, a nested one should cancel
an unquote
, which it doesn't. The same fallacious argument was
made in SRFI 200. Moreover, following the unquote
is not an expression, but either a ⟨
or a ⟩⟨
. ⟩
The following syntax together with the auxiliary
syntax unquote
, unquote-splicing
,
and ...
is exported by the library (srfi
:241 match quasiquote)
. The auxiliary syntax is
identical to the auxiliary syntax exported by (rnrs
base)
.
(quasiquote ⟨
⟩)
If no instances of the ellipsis ...
appear within
the ⟨
at the
same nesting level as the outermost quasiquote, the result of
evaluating the ellipsis-aware ⟩quasiquote
is the
same as if the quasiquote
syntax of R^{6}RS
were used. A subtemplate of the
form (⟨
is identical to ⟩ ⟨ ⟩)⟨
, except
that ellipses within ⟩⟨
have
no special meaning. If a subtemplate is followed by one or more
instances of the ⟩⟨
, the
expressions following a comma have to evaluate into nested lists
of (at least) the same nesting depths. They are replaced in the
output by all of the elements they match in the input,
distributed as indicated. It is an error if the output cannot
be built up as specified. An ⟩(⟨
form is equivalent to
⟩ …)(⟨
. ⟩ ⟨ ⟩ …)
⟨ ⟩
The auxiliary syntax quasiquote
within
an ⟨
is the
ellipsis-aware ⟩quasiquote
.
A portable implementation for R^{6}RS systems is in this SRFI's repository.
A portable implementation for R^{7}RS systems that
support syntax-case
is possible as well. For an
R^{7}RS implementation, the library names should be
derived from the given
(SRFI
97-conformant) R^{6}RS library names as usual.
The pattern matcher defined here has originally been described by R. Kent Dybvig. The original program was originally designed and implemented by Dan Friedman. It was later redesigned and reimplemented by Erik Hilsdale. The sample implementation of this SRFI shares no code with Dybvig's, Friedman's and Hilsdale's version.
The example section shamelessly reuses text from R. Kent Dybvig's original description.
© 2022 Marc Nieper-Wißkirchen.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.