This SRFI is currently in draft status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-262@nospamsrfi.schemers.org
. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.
A pattern matching form which can operate on arbitrary Scheme values is defined. It conforms to the following design principles.
The syntax of patterns is declarative. The syntax is extensible in the same way that Scheme’s procedural syntax is extensible by macros.
For most use cases, the use of the pattern matcher should produce code which is essentially as efficient at run time as equivalent procedural Scheme code would be, assuming a moderately optimizing Scheme implementation. This applies only when the equivalent code is also equivalently correct in terms of handling of error cases (i.e. including all type checks done automatically by the pattern matcher). However, using extension syntax should not cause this principle to be violated (provided the extension syntax is appropriately implemented).
This section is non-normative.
The following table lists the syntax, procedures, and condition type made available in Scheme code by the pattern matching library. The names of the syntactic tokens shown here are mnemonic only – consult the entry for each form to see the full grammar.
Scheme syntax | Scheme procedures | Condition type |
---|---|---|
(match expr clause …) (match-values expr mv clause …) (match-let let-like body) (match-let* let-like body) (match-let-values
mv let-like body) (match-let*-values
mv let-like body) (match-letrec let-like body) (match-letrec*
let-like body) (if-match let-like expr1
expr2) (match-define pat expr) (match-define-values
(pat …)
expr) (match-lambda mv clause …) (define-pattern-syntax
id tx
expr) |
(match-ellipsis?
obj) (make-match-violation) (match-violation?
obj) |
&match |
The following grammar summarizes the syntax of the pattern-matching sublanguage, including basic and derived pattern syntax which is included with the pattern-matching library itself.
Pattern syntax | |||
---|---|---|---|
a pattern ... | ... matches: | ||
pat | ::= | _ |
; anything |
| | id | ; anything, and binds subject to id |
|
| | self-quoting datum | ; a datum which is equal? |
|
| | 'datum |
; a datum which is equal? |
|
| | (quote datum) |
||
| | (cons patcar patcdr) |
; a pair |
|
| | (list seq
pat …) |
; a list |
|
| | (cons* seq pat … pat) |
; a possibly improper list |
|
| | (lset pat …) |
; a list with items in any
order |
|
| | (lset pat … pat dots) |
||
| | (vector seq
pat …) |
; a vector |
|
| | (? expr pat …) |
; if (expr subj)
and every pat matches
subj |
|
| | (apply expr pat …) |
; if values of (expr
subj) match the pats |
|
| | (and pat …) |
; if all of the pats
match |
|
| | (or pat …) |
; if any of the pats
match |
|
| | (not pat) |
; if pat doesn’t
match |
|
| | (seq iter
spec seq pat …) |
; a sequence |
|
| | (seq* iter
spec seq pat … pat) |
; a possibly improper sequence |
|
| | (seq/unordered
iter spec pat …) |
; a sequence with items in any
order |
|
| | (seq/unordered
iter spec pat … pat dots) |
||
| | `quasipat |
; according to
quasipattern |
|
| | (quasiquote quasipat) |
||
| | (keyword datum …) |
; anything you want :-) |
|
| | (keyword datum …
. datum) |
||
seq pat | ::= | pat | ; a single item from a
sequence |
| | pat dotty |
; multiple items from a sequence |
|
dotty | ::= | dots | ; zero or more,
greedily |
| | (dots n) |
; exactly that many |
|
| | (dots n
m) |
; between n and m, greedily |
|
| | (dots n
#t) |
; at least n, greedily |
|
dots | ::= | the identifier
... |
|
quasipat | ::= | self-quoting datum | ; a datum which is
equal? |
| | id | ; a symbol with the same name |
|
| | () |
; the empty list |
|
| | (seq quasipat …) |
; a proper list |
|
| | (seq quasipat … . quasipat) |
; an improper list |
|
| | #(seq quasipat …) |
; a vector |
|
| | ,pat |
; pattern |
|
| | (unquote pat) |
||
seq quasipat | ::= | quasipat |
; as
quasipattern |
| | ,@pat |
; any number of pattern |
|
| | (unquote-splicing pat) |
||
| | quasipat dotty |
; multiple items matching quasipattern |
Scheme’s lack of standard facilities for pattern matching on general Scheme values has been criticized by functional programmers of other languages for at least 35 years (Wadler 1987a). Generally, pattern matching enables a more declarative style of programming closer to mathematical notation, which is easier to reason about and which can handle error cases without having to write explicit error-checking code.
The very early history of structural pattern matching is tied to
its use in early computer algebra systems to find mathematical
expressions and equations having particular forms. (Moses 1967) Pattern matching as a programming language
feature in its recognizable, modern form – as a means of
deconstructing records using syntax mirroring that of their
constructors – was first proposed by Burstall (1969), building on work by McCarthy and Painter
(1967) in the use of structural
induction to prove the correctness of a compilation technique.
Burstall proposed pattern matching as an extension to Landin
(1966)’s proposed programming language
ISWIM, which was never actually implemented. The first
implemented generalized pattern matcher for a programming
language was by McBride (1970), who
modified the Lisp 1.5 metacircular evaluator in the process of
developing his own computer algebra system, adding the capability
to recognize structures of any kind built with cons
cells.
It was another decade before pattern matching in the style
proposed by Burstall became a part of the experimental language
Hope (Burstall, MacQueen, and Sannella 1980), and subsequently a standard feature of
the descendant ML (1973; see MacQueen, Harper, and Reppy 2020) and Miranda/Haskell (1985/1990
respectively; see Hudak et al. 2007) families of programming languages. In
Lisp, declarative destructuring was added to MacLisp with the
introduction of the now-familiar LET
form in 1978
(Steele et al. 1974–82), but in
the initial version of Common Lisp this feature survived only in
the DEFMACRO
argument list; LET
was
simplified to allow basic variable binding only, as we know it
today. (Steele 1984) Early implementations
of Common Lisp reintroduced a destructuring binding construct under
the name DESTRUCTURING-BIND
, which was adopted into
the ANSI standard. (Pitman and Moon 1989) In Scheme, a destructure
macro providing similar functionality was included in the T
implementation from around 1983.1 However, these
special forms did not allow the distinctive ML-style use of pattern
matching to test an input value against a series of successive
patterns and take action according to which pattern the value
conforms to; Lisp programmers have generally preferred the use of a
series of explicit conditionals in imperative style for this
purpose. Moreover, no RnRS report has ever standardized a
destructure
-like syntactic form.
However, prompted by Wadler (1987a)’s remarks, early discussions of what became the R4RS briefly included proposals for pattern matching. The majority of mailing list participants rejected pattern matching as a core language feature, and the proposal was relegated to a planned section of the report entitled the ‘Yellow Pages’ (containing portable libraries). The idea was then summarily dismissed at the June 1987 meeting of the RnRS report authors, and the Yellow Pages were never produced.
Schemers have nonetheless developed a variety of pattern
matching macros using pre-hygienic and hygienic macro systems. The
most popular pattern-matching syntax was originally developed by
Bruce F. Duba in the mid-1980s for Scheme 84.2 It became more widely known after
Andrew K. Wright adapted, documented, and distributed it for
several other Scheme implementations from around 1993 (Wright
1994, 1996). At
least three independent re-implementations of the same syntax have
followed: Bruce Hauman wrote a version using PLT Scheme’s
syntax-case
which was
integrated into the main distribution in 2003; Alex Shinn
implemented it in portable syntax-rules
(Shinn
2006) and later made further extensions; and
Adam Nelson has implemented it in syntax-rules
again
as part of his Schemepunk project.
This pattern matcher has been included in very many Scheme
implementations, both in the original defmacro
-based
implementation released by Wright, and in Shinn’s later hygienic
version which has now almost universally replaced it. Under the
name of the Wright–Cartwright–Shinn (WCS) matcher,3 an attempt at giving it a
specification was made in SRFI 204.
Other pattern matchers in Scheme include Oleg Kiselyov’s minimal
pmatch
in syntax-rules
and the
Chez Scheme match
macro implemented using hygienic
syntax-case
by Dan Friedman, Erik Hilsdale, and Kent
Dybvig with contributions from others, which was described in
SRFI 241.
Another implementation worth mentioning is
Bryan O’Sullivan’s simple pattern-case
macro,
probably the first ever pattern matcher written in portable
syntax-rules
, which was still an optional appendix to
the R4RS at the time. None of these are nearly as widely applicable
as Wright-style pattern matchers, however: many patterns which can
be expressed in Wright’s syntax would require falling back on
writing explicit procedural destructuring code in these systems.
All of these examples are limited because their basic pattern forms
support only the built-in Scheme datum types. Even in Wright’s
pattern matcher, the use of the lexical syntax of datum types to
specify deconstruction naturally limits their applicability to
Scheme object types which lack lexical syntax. Since the widespread
adoption of standard record type systems in Scheme code, and
especially their use to implement collections types beyond the
standard Scheme lists and vectors, this has become an increasingly
pertinent problem.
An early example which shows the potential of a less intrinsically limited pattern matcher is Christian Queinnec’s ‘Pattern-match compiler for non-linear second-order patterns on S-expressions’ (Queinnec 1990a, 1990b). In implementations in both Scheme and Common Lisp, Queinnec demonstrated the use of macros to extend the syntax of patterns, similarly to how the syntax of normal Lisp code can be extended; the core pattern language is small and not necessarily very tractable by humans, but can be expanded into from more easily comprehensible higher-level pattern forms. The use of macros to define new deconstructors is an adaptation to the Scheme context of the idea of extensible patterns, which is conventionally credited to Wadler (1987b)’s introduction of ‘views’ to the Miranda language, although McBride’s original Lisp 1.5 pattern matcher already featured a version of extensible patterns.
Hauman’s PLT Scheme implementation of Wright’s pattern matching interface also introduced an alternative interface, which cleanly separated the lexical syntax of Scheme from the syntax of deconstructors. In 2005, Sam Tobin-Hochstadt took advantage of this to allow users to define new deconstructors using hygienic macros, an independent reinvention of Queinnec’s concept. This was followed by a clean re-implementation in 2008 using PLT Scheme’s advanced hygienic macro system.4 The resulting pattern matcher – now part of PLT Scheme’s successor, Racket – is described by its author in Tobin-Hochstadt (2011).
Extensibility, combined with other primitives provided by
Wright’s original pattern matcher (in particular, pattern types for
calling arbitrary Scheme predicate and accessor procedures), allows
data types defined by programmers to receive their own first-class
support by an existing pattern matcher. Racket’s match
inspired the design of a similar extensible pattern matcher in
Emacs Lisp called pcase
(Monnier and Sperber 2020), as well as the Common Lisp
libraries Optima and
Trivia.
The pattern matcher proposed here is likewise in the lineage of, and very closely inspired by, the Racket pattern matcher. Its main adaptation is in the use of identifier properties (SRFI 213) as the means of extensibility. The clean extensibility of the pattern matcher avoids the ‘kitchen sink of features’ impression which the featureful but non-extensible syntax of Wright-style matchers makes: the core of the matcher is kept very small with only a handful of basic primitives, but the matcher as a whole is comparably powerful due to the ability to combine these primitives into new matching abstractions.
Compared to Wright-style pattern matchers, the pattern matcher proposed here is essentially equivalent in basic computational power, omitting only the following (mis)features:
The get!
and set!
matchers which
maintain a reference within a data structure and mutate it. This
does not compose well as a feature with the extensibility provided
by this pattern matcher, and it is probably a mistake to encourage
a matching clause to mutate the data it matched.
Support for ‘non-linear’ patterns in which one variable may be
named multiple times, requiring the values matched by the
respective subpatterns to be the same (in the sense of
equal?
). This is a new feature added by Hauman and
Shinn: it was not supported by Wright’s original implementation. As
explained below,
this feature does not compose well with several other features
which are used more commonly and which I feel to be more
worthwhile. Such patterns also appear to be rarely used in
practice, adding significant semantic complication for little
benefit.
Early binding: in Shinn’s implementation, although not in any
other including Wright’s original, variables are bound
during the pattern matching process, not after it has
completed. This means the bindings of variables are visible during
the evaluation of subsequent expressions embedded within patterns.
This makes the semantics of matching less declarative, which has
more than merely theoretical consequences: it requires establishing
a fixed order of evaluation for all pattern types, which hinders
the application of many optimized implementation strategies for
matching. It has also required piling misfeature upon top of
misfeature: Shinn’s implementation of match-letrec
has
to use inherently fragile Petrofsky extraction (Petrofsky 2001).
In terms of expressive power, however, this pattern
matcher is significantly better than Wright-style pattern matchers
because any Scheme programmer can define new types of pattern
rules. These pattern rules can, for example, provide matching
functionality for new data types, or improve or tailor the
semantics of the matchers included for Scheme’s built-in data
types. They are written much like regular macro transformers: in
particular, they can usually be entirely specified with the
familiar syntax-rules
system, but any other means of
creating macro transformers may be used. They compose transparently
with, and appear (in both syntax and generally in semantics)
identical to the pattern rules included with the pattern matcher
itself, in the same way normal Scheme macros compose with
procedural Scheme code.
The new front end removed fenders in favour of making clause contents bodies, but there’s still no means of explicit fall-through to give up on a clause and continue matching. Some possible solutions, by no means mutually exclusive:
Trivia provides
guard patterns which combine a subpattern, which may bind
pattern variables, with some Lisp code, which has access to those
variables (only) and which must evaluate true for the guard pattern
to match. An entire pattern wrapped in a guard pattern would be the
equivalent of a fender, but you could also wrap an individual
subpattern in a guard
if needed – which would
help bail out earlier for performance improvements (especially when
giving up early on expensive patterns like list patterns).
This may be moderately tricky to implement efficiently.
The name would need to be bikeshedded: guard
already means something else in Scheme.
Racket provides a variety of strategies for escaping from a
match clause, including both classical fenders introduced by a
#:when
keyword, and a =>
keyword which
binds an identifier to an escape procedure (returning early) to
give up on this clause and moves on to the next one.
#:when
fenders can also be combined with
#:do
fenders, in any number and order respectively.
#:do
fenders add definitions which can be used by
subsequent #:when
fenders.
Probably my preferred option right now would be a
match-next-clause
syntax parameter (name also maybe
needs bikeshedding – inspired by CL
CALL-NEXT-METHOD
). For match
and
match-values
(and maybe
match-lambda
but not any others), invoking this syntax
parameter would continue by trying the next matching clause, but
would not return early. It’s not possible to do that without
capturing a continuation in every match clause, even ones that
don’t use match-next-clause
, and Scheme
implementations seemingly do not optimize that away when the
continuation is never invoked.
Because it wouldn’t escape, it could only be used ‘safely’ in
tail position of a clause. This seems like a usability hazard. Note
that Racket actually has this (failure-cont
), but
pushes it deep down in the documentation page so people won’t be
tempted to use it incorrectly, and will use safer
=>
escape procedures instead. With continuation
marks I think it might at least be possible to detect and warn at
run time when match-next-clause
is used outside of
tail position – but I’m not sure of the cost of installing a
continuation mark, which might never be used, around every matching
clause.
Nonetheless, I like this, because compared to regular fenders,
it means clause bodies are still bodies, and it doesn’t require an
explicit opt in for each clause that uses it; but because it’s a
syntax parameter, the code to fall through will still only be
generated when someone actually writes a clause that uses
match-next-clause
.
Should there be a primitive besides ?
which exposes
an and
-like pattern whose subpatterns may be tested in
any order?
Should there be a means to declare explicitly which predicate identifiers used in patterns are complementary (mutually exclusive) and which imply one another (usually because of subtyping)? This would be useful to help implementations generate more efficient matching code, but if the declarations are wrong it will cause undefined behaviour.
Should the pattern matcher be allowed to eliminate
apply
patterns whose subpatterns are irrefutable
(cannot affect whether a pattern matches) and which do not bind any
variables? Or should it instead depend on the dead code elimination
of the Scheme compiler to do that optimization?
Nested quasiquote
patterns work the naïve way. This
is the same as in Wright’s implementation, Nelson’s implementation,
and Racket’s match
, but not Shinn’s implementation,
where nesting works as for regular Scheme quasiquote
.
What’s the Right Thing here? Emacs recently had a long
discussion about this for pcase
.
I don’t understand what the Right Thing for
unquote-splicing
in quasiquote
patterns
is supposed to be. Wright’s original version and the Racket pattern
matcher both actually splice the unquote-splicing
list
subpatterns into the surrounding list pattern, including (in the
case of Racket’s matcher) the ability to do funky things with
list-no-order
(Racket’s equivalent of my
lset
):
(match '(1 2 3 4) (`(1 ,@(list 2 3) 4) #t)) ;=> #t
(match '(1 2 3 4) (`(1 ,@'(2 3) 4) #t)) ;=> #t
(match '(1 2 3 4) (`(1 ,@(list-no-order 2 3) 4) #t)) ;=> #t
(match '(1 3 2 4) (`(1 ,@(list-no-order 2 3) 4) #t)) ;=> #t
(Of these, only the equivalent of the first one – spelled
`(1 ,@(2 3) 4)
– has a working equivalent in
Wright’s original implementation.)
While the composition with list-no-order
is cool
and these deconstruction semantics do theoretically correspond to
the construction semantics of regular quasiquote
, the
immediate utility of this is unclear to me. (I can’t find any
examples in real Racket code with GitHub Code Search, although it’s
hard to look for such examples with a pure line-based regular
expression search.) It would also be very difficult to compose
correctly and fully at least with the lower-level seq
family of patterns, and would be an ugly hack of some variety even
just to support the list
family of subpatterns. (Note
that SRFI 257
also supports the Wright/Racket-style semantics, and can do so
easily because of its backtracking ~append
pattern.)
By contrast, Shinn’s implementation simply treats
unquote-splicing
as an alias for an ellipsized
pattern:
(match '(1 (2 3) 4) (`(1 ,@(2 3) 4) #t)) ;=> #t
(match '(1 (2 3) (2 3) 4) (`(1 ,@(2 3) 4) #t)) ;=> #t
(match '(1 4) (`(1 ,@(2 3) 4) #t)) ;=> #t
This is also how this pattern is currently specified in this
pattern matcher, but the examples above are not consistent with
normal Scheme quasiquote
.
The one case in which the two behaviours are reliably the same
as one another and consistent with constructive
quasiquote
is when the subpattern of
unquote-splicing
is a simple identifier (a pattern
variable to be bound):
(match '(1 2) (`(1 ,@x 2) x)) ;=> ()
(match '(1 2 3) (`(1 ,@x 3) x)) ;=> (2)
(match '(1 2 3 4) (`(1 ,@x 4) x)) ;=> (2 3)
This is an expand-time error in Wright’s original implementation, but Racket and Shinn’s implementation both agree here. (The above examples return identical results in both.)
Racket’s behaviour better matches the intention of pattern
matching – that deconstruction mirrors construction – but
given the difficulties of implementation and composition, perhaps
the Right Thing for a conservative SRFI specification is simply to
allow only an identifier as the subpattern of
unquote-splicing
.
I’m not a fan of the extended
ellipsis
syntax. It should be bikeshedded a little to
be more palatable.
It’s not clear to me how to describe the correct semantics for
seq/unordered
and lset
so that they pull
out leftmost values from their input sequences when possible, given
the interaction between the order of the subpatterns and the order
of values from the input.
[Added by editor: More examples should be added.]
In the below specification text, the phrase ‘matching failure is
signalled’ means that an exception with condition types
&match
(see below)
and &irritants
is raised. The field of the
&irritants
conditions is set to a list of the
values which were supposed to match, as specified in each
section.
(match input
expression
clause …)
Syntax: Input expression
must be an expression.
Each clause
has the form
(pattern body)
.
A pattern
has the following
grammar:
pattern | ::= | _ |
| | identifier | |
| | pattern syntax | |
| | pattern datum |
Pattern syntax
has the form
(keyword datum
…)
or (keyword datum … . datum)
.
Pattern datum
is any value
which is a self-evaluating datum in normal Scheme code.
Semantics: The
input expression
is evaluated
to produce a value, and clause
s
are checked in turn from left to right, until one is found whose
pattern
s matches the value.
Note: In the literature on pattern matching, a distinction is sometimes made between ‘top-to-bottom’ and ‘left-to-right’ pattern matching. The former refers to the relative priority of patterns with respect to one another, and is one of a number of possible strategies for disambiguating between multiple patterns which match the same input value. Under the top-to-bottom rule, which is applied here (and here called ‘left-to-right’ in deference to the usual terminology for Scheme and Lisp forms), disambiguation depends entirely on the order in which patterns appear in the code. (See Hudak et al. 2007 section 5.2 for references to other possible pattern disambiguation strategies.) ‘Left-to-right’, in the terminology of those who make the distinction, refers to the order in which parts of an individual pattern are tested against parts of the input term. This is a practical necessity in lazily evaluated languages, where a non-matching earlier part of a term might prevent the pattern matcher from having to examine a later part whose evaluation might not terminate; but in strict languages such as Scheme and ML there is little practical benefit to making a guarantee on the order in which parts of the input value are tested. Indeed, in order to enable more optimized compilation, this pattern matcher does not make a general ‘left-to-right’ guarantee on the order in which parts of a pattern are tested against input: only certain types of patterns guarantee the order of testing.
When a clause
with a
matching pattern
is found, the
body
of that clause is
evaluated and its result returned. The body
is evaluated in tail position if the
match
expression as a whole is in tail position.
The testing of whether pattern
s match against input values is formally
defined as follows.
The value against which a pattern is tested is called the
subject of that pattern. Whether a pattern matches against a
particular subject is a boolean property. For example, a pattern
which is an underscore (the auxiliary syntax keyword
_
) matches any subject, whereas a pattern datum
will match if its subject is the
same (in the sense of equal?
) as the pattern datum
itself, and will not match
otherwise.
In addition, the matching of a pattern against a subject can
cause an identifier to be bound as a variable within the evaluation
of the body
. A pattern which is
an identifier other than _
will, like _
,
match any subject, but will also cause that identifier to be bound
as a variable to the subject of the pattern. Variables bound in
this way are also called pattern variables, although they are
normal Scheme variable bindings and not pattern variables in the
sense of syntax-case
.
Note: The
auxiliary syntax keyword else
has no special function
in match
expressions. A pattern consisting only of the
_
identifier can be used as a catch-all to handle the
case when no previous pattern matches. Attempting to use the
else
keyword for this purpose will result in a
variable being bound within the body of the corresponding clause,
shadowing the else
keyword and potentially causing
unexpected results when attempting to use that keyword in another
cond
expression within the clause’s body.
A pattern which is an instance of pattern
syntax
can contain subpatterns (recursively nested
instances of pattern
) which
have different subjects to the pattern
syntax
itself; for example, the pattern syntax
might match a record instance as
its subject, while a subpattern of it might match one of the field
values of that record. The keyword
of an instance of pattern syntax
uniquely determines a particular
type of pattern; each such type of pattern can have its own syntax
and semantics. An instance of pattern
syntax
usually matches its subject if some combination
of subpatterns contained within it matches a corresponding
combination of values derived from the subject, and may bind
variables for some or all of these derived values. It is a syntax
violation if the keyword
of an
instance of pattern syntax
is
not lexically bound to an identifier for a valid type of pattern
syntax (see define-pattern-syntax
).
It is a syntax violation if any pattern which is an identifier
other than _
appears more than once in any
pattern
, including occurrences
within different subpatterns. The exception is identifiers within
subpatterns of or
patterns,
which must appear in each such subpattern in order to be bound as
variables within the evaluation of the body
(see the description under the entry for
that pattern syntax).
If none of the pattern
s
match the result of evaluating the input
expression
, matching failure is signalled with a
one-item list containing the result of the input expression
evaluation.
(match-values input expression
clause …)
Syntax: Input expression
must be an expression.
Each clause
has the form
((pattern …)
body)
.
Semantics:
Match-values
is similar to match
, except
that the input expression
can
evaluate to multiple values, and each value is tested against the
respective pattern
in an
unspecified order. Each clause
can have a different number of pattern
s; only those clauses with the correct
number of pattern
s for the
number of values returned from evaluating the input expression
will be matched against.
If none of the clause
s’
pattern
s match all of the
corresponding values, matching failure is signalled with a list of
the values produced by the input
expression
.
(match-let ((pattern expression) …)
body)
Match-let
matches the results of evaluating all the
expression
s against the
corresponding pattern
s. The
pattern
s must all match the
results of the respective values obtained by this evaluation,
otherwise matching failure is signalled with a list of the values
of the expression
s. The
evaluation of the expression
s
takes place in an unspecified order and out of the scope of any of
the pattern variables from the pattern
s; the order in which the resulting
values are tested against the corresponding patterns is also
unspecified. The body
is then
evaluated with the pattern variables resulting from the matches
being bound to their respective subjects. The body
is evaluated in tail position if the
match-let
expression as a whole is in tail
position.
Match-let
could be implemented by:
(define-syntax match-let
(syntax-rules ()
((_ ((pat init) ...) body_0 body_1 ...)
(match-values (values init ...)
((pat ...)
body_0 body_1 ...)))))
(match-let* ((pattern expression) …)
body)
Match-let*
is like match-let
, except
that the expression
s are
evaluated from left to right, each with the pattern variables bound
by the pattern
s corresponding
to the expression
s to the left
of it in scope.
If any of the pattern
s do
not match the result of evaluating the corresponding
expression
, matching failure is
signalled with a one-item list containing the value which failed to
match the corresponding pattern.
Match-let*
could be implemented by:
(define-syntax match-let*
(syntax-rules ()
((_ () body_0 body_1 ...)
(let ()
body_0 body_1 ...))
((_ ((pat init) more ...) body_0 body_1 ...)
(match-let ((pat init))
(match-let* (more ...)
body_0 body_1 ...)))))
(match-let-values
(((pattern …)
expression) …)
body)
Match-let-values
is like match-let
,
except that the expression
s can
evaluate to multiple values and must match against the
corresponding number of pattern
s. The order in which the values from the
expression
s are tested against
the pattern
s is
unspecified.
If any of the groups of pattern
s do not match the result of evaluating
the corresponding expression
,
matching failure is signalled with a concatenated list of all of
the values of all of the expression
s.
Match-let-values
could be implemented by:
(define-syntax match-let-values
(lambda (stx)
(syntax-case stx ()
((_ (((pat ...) init) ...) body_0 body_1 ...)
(with-syntax
((((temp ...) ...)
(map generate-temporaries #'((pat ...) ...))))
#'(let-values
(((temp ...) init) ...)
((match-lambda
((pat ... ...) body_0 body_1 ...))
temp ... ...)))))))
(match-let*-values
(((pattern …)
expression) …)
body)
Match-let*-values
is like
match-let-values
, except that the expression
s are evaluated from left to right,
each with the pattern variables bound by the pattern
s corresponding to the expression
s to the left of it in scope. The
order in which the values from each expression
is tested against the corresponding
pattern
s is unspecified.
If any of the groups of pattern
s do not match the result of evaluating
the corresponding expression
,
matching failure is signalled with a list containing the values
which failed to match the corresponding patterns.
Match-let*-values
could be implemented by:
(define-syntax match-let*-values
(syntax-rules ()
((_ () body_0 body_1 ...)
(let ()
body_0 body_1 ...))
((_ (((pat ...) init) more ...) body_0 body_1 ...)
(match-let-values (((pat ...) init))
(match-let*-values (more ...)
body_0 body_1 ...)))))
(match-letrec ((pattern expression) …)
body)
Match-letrec
is like match-let
, except
that the expression
s can
recursively refer to the variables bound by the pattern
s, in the manner of standard Scheme
letrec
. It is an error if one of the expression
s tries to access or assign the value
of any of the variables during its evaluation.
Note: The test
suite tries to detect whether the Scheme implementation it is
running on enforces the letrec
restriction by
signalling an error, and will automatically skip the test which
ensures it is enforced for match-letrec
if not. This
check assumes that calling eval
on a form that
violates the letrec
restriction will raise some kind
of error, which is guaranteed by R6RS but not by R7RS small. If
letrec
restriction violations do not cause an error to
be signalled but rather some other kind of (undefined) behaviour,
it may be better to comment out the test when porting an
implementation of this SRFI to a new Scheme implementation, rather
than relying on this check.
(match-letrec* ((pattern expression) …)
body)
Match-letrec*
is like match-let*
,
except that the expression
s can
recursively refer to the variables bound by the pattern
s, and can access and assign the values
of pattern variables bound in patterns to the left of their own
pattern
, in the manner of
standard Scheme letrec*
.
Match-letrec*
could be implemented by:
(define-syntax match-letrec*
(syntax-rules ()
((_ ((pat init) ...) body_0 body_1 ...)
(let ()
(match-define pat init) ...
(let ()
body_0 body_1 ...)))))
(match-define pattern expression)
Match-define
defines the pattern variables
specified within the pattern
to
the corresponding values obtained from matching the pattern against
the result of evaluating the expression
. If the pattern
does not match the result of evaluating
the expression
, matching
failure is signalled with a one-item list containing the value of
the expression
.
(match-define-values
(pattern …)
expression)
Match-define-values
is like
match-define
, except that the expression
can return multiple values, which are
matched against the corresponding pattern
s. The order in which the values are
matched against the corresponding patterns is unspecified. If the
pattern
s do not match the
values produced by evaluating the expression
, matching failure is signalled with a
list of the values.
(if-match ((pattern expression) …)
consequent
alternate)
Syntax: Consequent
and alternate
must be expressions.
Semantics:
If-match
is like match-let
, except that
if all the pattern
s match the
values obtained by evaluating their corresponding expression
s, the consequent
expression is evaluated with all the
pattern variables in scope; otherwise, the alternate
expression is evaluated with none of
the pattern variables in scope. The consequent
or alternate
is evaluated in tail position if the
if-match
expression as a whole is in tail
position.
If-match
could be implemented by:
(define-syntax if-match
(lambda (stx)
(syntax-case stx ()
((_ ((pat init) ...) conseq alter)
(with-syntax ((else (make-list (length #'(pat ...)) #'_)))
#'(match-values (values init ...)
((pat ...) conseq)
(else alter)))))))
(match-lambda
clause …)
Syntax: Each
clause
has the form
((pattern …)
body)
.
Semantics:
Match-lambda
is the fundamental pattern matching form.
It evaluates to a procedure which takes arguments according to the
number of pattern
s in each of
the clause
s. When the procedure
is called, it matches its arguments against each of the
clause
s which have the same
number of pattern
s from left to
right, testing each argument against the corresponding
pattern
. The order in which the
arguments are matched against the pattern
s of each individual clause is
unspecified. The body
of the
leftmost matching clause is then evaluated with the pattern
variables from the pattern in scope. If no clause
s have the right number of
pattern
s, or if none of the
pattern
s match all of the
respective arguments, matching failure is signalled with a list of
the arguments to the procedure call. The body
is evaluated in tail position if the call
to the returned procedure as a whole is in tail position.
The following instances of pattern
syntax
are exported by the pattern matching library.
Where the keywords are existing bindings within the base Scheme
library, the pattern syntax transformers are attached to those
bindings; otherwise, they are exported as auxiliary syntax
keywords.
(quote datum)
A quote
pattern matches if its subject is the same
as the datum
in the sense of
equal?
, and does not match otherwise. No variable is
bound.
Note: For most
possible values of datum
apart
from identifiers (symbols), list-structured forms (including the
empty list), and (on strict implementations of the R6RS) vectors, a
quote
pattern is exactly equivalent to the unquoted
datum
used as a pattern.
Example:
(define (null-or-something-else obj)
(match obj
('() 'null)
(_ 'something-else)))
(null-or-something-else '()) ;=> null
(null-or-something-else 'nil) ;=> something-else
(? procedure expression pattern
…)
The procedure expression
is
evaluated to produce a procedure of one argument and one return
value. A ?
pattern tests whether applying this
procedure to its subject returns a true value. It is an error if
application of the procedure returns zero values or more than one
value. If the result of the test is false, the ?
pattern does not match.
Otherwise, if there are no pattern
s, the ?
pattern matches its
subject and no variable is bound. If there are any
pattern
s, those patterns are
matched in an unspecified order against the subject and the
?
pattern matches if all of them match the subject of
the ?
pattern. All of the variables bound by the
subpatterns are bound by the match.
Note: It is
unspecified when and how many times the procedure expression
is evaluated, and when and
how many times the resulting procedure is called. In particular, it
may be called multiple times on the same subject. The behaviour is
undefined if the evaluation of the procedure
expression
or the subsequent call to the procedure
assigns to the binding of any variable used in an expression within
any pattern.
If some of the subpatterns need to be checked before others, use
an and
pattern to enforce the ordering.
Example:
(define (integer-or-symbol val)
(match val
((? integer?) 'integer)
((? symbol?) 'symbol)))
(integer-or-symbol 24) ;=> integer
(integer-or-symbol 'x) ;=> symbol
(integer-or-symbol "x") ; (exception raised)
(apply procedure expression pattern
…)
The procedure expression
is
evaluated to produce a procedure of one argument and as many return
values as there are pattern
s.
An apply
pattern matches if and only if the values
resulting from the application of this procedure to its subject
match each of the corresponding pattern
s: each value becomes the subject of the
respective subpattern. The order in which the values are tested
against the pattern
s is
unspecified. The behaviour is undefined if application of the
procedure to the subject returns more or fewer values than the
number of pattern
s. All of the
variables bound by all of the subpatterns are bound by the
match.
Note: It is
unspecified when and how many times the procedure expression
is evaluated, and when and
how many times the resulting procedure is called. In particular, it
may be called multiple times on the same subject. The behaviour is
undefined if the evaluation of the procedure
expression
or the subsequent call to the procedure
assigns to the binding of any variable used in an expression within
any pattern.
Example:
(define (fizz? n)
(match n
((apply (lambda (x) (floor/ x 3)) _) #t)
(_ #f)))
(fizz? 1) ;=> #f
(fizz? 3) ;=> #t
(fizz? 5) ;=> #f
(fizz? 21) ;=> #t
(and pattern …)
An and
pattern matches if and only if its subject
matches all of its subpatterns. All of the variables bound by all
of the subpatterns are bound by the match.
Unlike the subpatterns of ?
and apply
patterns, the subpatterns of an and
pattern are
matched from left to right and short-circuit. It is therefore safe
for later subpatterns to depend on the matching of previous
subpatterns for type-checking purposes.
Example:
(define (fizzbuzz n)
(match n
((and (? fizz?) (? buzz?)) 'fizzbuzz)
((? fizz?) 'fizz)
((? buzz?) 'buzz)
(_ n)))
(map fizzbuzz (iota 16))
;=> (fizzbuzz 1 2 fizz 4 buzz fizz 7 8 fizz buzz 11 fizz 13 14 fizzbuzz)
(or pattern …)
An or
pattern matches if its subject matches any of
its subpatterns. The variables that are bound to variables
correspond to the values from the leftmost subpattern which
matches. It is a syntax violation if the code associated with any
pattern refers to a variable bound by one or more, but not all, of
the subpatterns of an instance of the or
pattern
syntax.
The subpatterns are matched from left to right and short-circuit. It is therefore safe for later subpatterns to depend on the matching of previous subpatterns for type-checking purposes.
Example:
(define (arithmetic-operation x)
(match x
((list (and operator (or '+ '- '* '/))
(and operands (? number?)) ...)
(list operator operands))))
(arithmetic-operation '(+ 2 2)) ;=> (+ (2 2))
(arithmetic-operation '(/ 42 7)) ;=> (/ (42 7))
(not pattern)
A not
pattern matches if and only if its subject
does not match the given subpattern. No variable is
bound.
Example:
(define nonzero?
(match-lambda
(((? zero?)) #t)
((_) #f)))
(seq name ((var init step) …)
termination
condition
reference
expression
sequence pattern …)
Syntax: Name
and all the var
s must be identifiers. All the
init
s, step
s, and the termination condition
and reference expression
must be expressions.
Sequence pattern
has the
following grammar:
sequence pattern | ::= | pattern |
| | pattern extended ellipsis |
extended ellipsis | ::= | ellipsis |
| | (ellipsis n) |
|
| | (ellipsis min max) |
n | ::= | an exact nonnegative integer literal |
min | ::= | an exact nonnegative integer literal |
max | ::= | an exact nonnegative integer literal greater than or equal to min |
| | #t |
Ellipsis
refers to the
...
identifier from the base Scheme library.
Semantics: A
seq
pattern matches a series of values from an
iterable sequence.
A sequence pattern
which is
a single pattern
with no
ellipsis following matches one item within the sequence if the
corresponding pattern
matches
that item. Pattern variables within the pattern are bound.
When a pattern is followed by an extended
ellipsis
it matches multiple items within the
sequence, and pattern variables within the pattern list are bound
to lists of the values which matched within the input sequence. A
pattern followed by single ellipsis
can match any number of input forms,
including zero. A pattern followed by (ellipsis n)
must match
exactly n
items in the input
sequence. A pattern followed by (ellipsis min max)
must match at least min
items within the input sequence and up to
max
items, or an unbounded
number of items greater than or equal to min
items if max
is #t
.
Ellipsized patterns match against input values greedily –
that is, they do not stop matching at the first value which matches
the pattern immediately following them; rather, each sequence
pattern attempts to consume as much input that matches itself as
possible. When multiple ellipsized patterns are included within the
list of sequence pattern
s, the
standard leftmost priority rule applies: patterns which are further
left in the series of sequence
pattern
s match the largest number of items in the list
they can while still satisfying the seq
pattern as a
whole.
In order to access items from the sequence, the pattern matcher
establishes state variables called var
by first binding them to the results of
evaluating the respective init
expressions in an unspecified order. At each stage of the
iteration, the pattern matcher first checks to see if the
termination condition
evaluates
to a true value: if so, the sequence is deemed to have completed
and the seq
pattern matches only if the complete input
sequence matches the series of sequence
pattern
s. Otherwise, the pattern matcher evaluates the
reference expression
and checks
the result of the evaluation against one or more of the
sequence pattern
s as
appropriate to its own current state. Then it continues the
iteration by evaluating the step
expressions, binding the results of these
evaluations to the respective var
s, and beginning a new stage of the
iteration.
Within the evaluation of the init
s, step
s, the termination
condition
, and the reference
expression
, the identifier named by name
is bound as a variable to the subject of
the seq
pattern. Within the evaluation of the
step
s, the termination condition
, and the reference expression
, the var
s are bound to their respective current
values. Neither the name
nor
the var
s are bound as pattern
variables if the match succeeds, nor are they visible as variables
or pattern variables within the sequence
pattern
s.
Note that the pattern matcher may choose to iterate over the
input sequence in multiple instances in parallel, to backtrack and
consider the same item in the input sequence multiple times, or to
terminate the iteration(s) early after concluding that the pattern
will never match. The values of the var
s must encapsulate the entirety of the state
required to access items from the iterable sequence: evaluation of
the init
, step
, reference
expression
, and termination
condition
expressions must not mutate the internal
state of the seq
pattern’s
subject such that a parallel evaluation of the same expressions on
the sequence would fail, and must not mutate the values of the
var
s such that restoring
previous values in order to backtrack and re-examine an
already-examined part of the sequence would not work. As with the
expressions embedded in ?
and apply
patterns, it is undefined behaviour if evaluation of any of these
expressions assigns to a variable used in any expression within any
pattern.
Note: The
structure of an iteration with seq
is fundamentally
similar the patterns familiar to Scheme programmers from both
‘named let
’ and do
. A seq
iteration can be thought of as expanding to
(let loop ((var init) …)
(if termination condition
(return-match-or-fail)
(begin
(test-current-subpattern-for-input reference expression)
(loop step …))))
or to
(do ((var init step) …)
(termination condition (return-match-or-fail))
(test-current-subpattern-for-input reference expression))
Example: Given a type called a ‘vektor’ which is otherwise similar to Scheme’s built-in vector type, the following pattern would match items from this type.
(seq vek ((idx 0 (+ idx 1)))
(>= idx (vektor-length vek))
(vektor-ref vek idx)
#;sequence-subpatterns...)
See also the sample implementation for the vector
pattern syntax and the
examples under that entry.
(seq* name ((var init step) …)
termination
condition
reference
expression
sequence pattern … pattern)
Seq*
is similar to seq
, with a
difference: after the termination
condition
is satisfied, the reference expression
(which should usually
simply be one of the var
s) is
evaluated one more time and its result tested against the final
pattern
; and if the final
pattern
has been reached and
matched by some previous iteration over the sequence, and that
final test fails, the result of that previous match will be the
result of matching the whole seq*
pattern.
This is the primitive for matching sequences comparable to
Scheme’s built-in lists built of pairs which do not guarantee
‘properness’, as well as for matching structures such as series of
pairs terminated by a pattern which may itself match a series of
pairs. Note, however, that due to the greediness of the ellipsis in
sequence pattern
s, neither
seq
nor seq*
is well-suited to matching
sequences from streams or other potentially infinite sequences.
Example: Given the ‘pare’ definition from section 5.5 of the R7RS small report, the following example shows how to match proper ‘lysts’ of ‘pares’, terminated by Scheme’s standard empty list value.
(seq* lyst ((curr list (kdr curr)))
(not (pare? curr))
curr
(apply kar #;subpattern) #;...
'())
See also the sample implementation for the cons*
and list
pattern syntax and the examples
under those entries.
(seq/unordered
name ((var
init step)
…)
termination
condition
reference
expression
pattern …)
(seq/unordered name ((var init step) …)
termination
condition
reference
expression
pattern … rest pattern ellipsis)
Syntax: Rest pattern
, if given, is a pattern.
Semantics:
Seq/unordered
is similar to seq
, but the
values which match the subpatterns may appear in the input sequence
in any order. Also, unlike a seq
pattern, there can
only be one subpattern which matches more than one item from the
input sequence: it must be the last subpattern in the
seq/unordered
pattern, and it must be unbounded.
Seq/unordered
patterns are useful for matching
set-like or association list-like structures which are not, or
which are only partially, inherently ordered. Because of their
intended usefulness for association lists, each pattern
matches the earliest value in the input
sequence which satisfies it while still allowing all of the
pattern
s in the sequence to
match if possible. This maintains the historical property of
association lists, whereby the leftmost value for a particular key
is considered the current value for that key, and further values
with the same key are older values which have been overwritten by
subsequent consing onto the beginning of the association list.
If the rest pattern
is
given, it must match any remaining values in the list which do not
match any of the given pattern
s.
All the pattern variables bound by all of the pattern
s are bound by the match. All of the
variables bound by the rest
pattern
are bound to lists of the corresponding values
from the input sequence, in the order they appeared in the original
input sequence.
Note: While it
is possible to write an implementation of
seq/unordered
which has good asymptotic performance
for the majority of real-world use cases, in general the number of
possible orderings of subpatterns and of input values is inherently
factorial, and this will be reflected in the time and/or memory
space required for some uses. For good performance, as many of the
pattern
s as possible should be
unambiguous with respect to one another (i.e. so that no value
in the sequence could ever match more than one of the
pattern
s, even if multiple
values in the sequence could match the same pattern). The
rest pattern
may safely be a
pattern which is ambiguous with respect to any or all of the other
subpatterns, because it is only tried when none of the other
subpatterns match.
Examples: See the sample
implementation for the lset
pattern syntax and the examples under that entry.
(cons pattern1 pattern2)
A cons
pattern matches a pair whose car matches
pattern1
and whose
cdr matches pattern2
. The order in which the car
and cdr are tested against the corresponding patterns is
unspecified. All pattern variables in both subpatterns are bound by
the match. If the subject of the cons
pattern is not a
pair or either of the pattern
s
does not match, the cons
pattern does not match.
Example: The
fold
example using list-case
from
SRFI 239
can be reformulated in terms of match
as follows.
(define fold
(lambda (proc seed ls)
(let f ((acc seed) (ls ls))
(match ls
((cons h t) (f (proc h acc) t))
('() acc)
(_ (assertion-violation 'fold "not a list" ls))))))
Cons
could be implemented as pattern syntax by:
(define-pattern-syntax cons
(syntax-rules ()
((_ car-pat cdr-pat)
(? pair?
(apply car car-pat)
(apply cdr cdr-pat)))))
(list sequence pattern …)
A list
pattern matches a proper list of Scheme
values whose elements match the sequence
pattern
s.
Examples:
(match '(1 2 3)
((list a b c) (+ a b c))) ;=> 6
(match '(1 2 3 4)
((list a b c) (+ a b c))) ; (exception raised)
(match '(1 2 3 . 4)
((list a b c) (+ a b c))) ; (exception raised)
(match '(tagged 1 x 2 y)
((list 'tagged n ...) n)) ;=> (1 x 2 y)
(match '(x y z 10 11 12)
((list (and (? symbol?) syms) ...
(and (? number?) nums) ...)
(values syms nums))) ;=> (x y z) (1 2 3)
(match '(1 2 3 split 4 5 6)
((list before ... 'split after ...)
(values before after))) ; (1 2 3) (4 5 6)
(match '(1 2 split 3 4 split 5 6)
((list before ... 'split after ...)
(values before after))) ; (1 2 split 3 4) (5 6)
List
could be implemented as pattern syntax by:
(define-pattern-syntax list
(syntax-rules ()
((_ subpat ...)
(cons* subpat ... '()))))
(cons* sequence pattern … pattern)
A cons*
pattern matches a proper or improper list
of Scheme values. The final pattern
matches the tail of the list after all
sequence pattern
s. If the tail
pattern
matches pairs, it will
match the shortest possible proper or improper list; the
sequence pattern
s may be tested
for matching against some items within the pairs which actually
ultimately form part of the match for the pattern
s.
Examples:
(match '(1 2 3 . 4)
((cons* a b c d) (+ a b c d))) ;=> 10
(match '(1 2 3 4 . 5)
((cons* x ... y) (cons y x))) ;=> (5 1 2 3 4)
Cons*
could be implemented as pattern syntax
by:
(define-pattern-syntax cons*
(syntax-rules ()
((_ seq-pat ... tail-pat)
(seq* ls ((curr ls (cdr curr)))
(not (pair? curr))
curr
(apply car seq-pat) ...
tail-pat))))
(vector sequence pattern …)
A vector
pattern matches a vector whose elements
match the sequence
pattern
s.
Examples:
(match '#(1 2 3)
((vector a b c) (list a b c))) ;=> (1 2 3)
(match '#(record 1 x 2 y)
((vector 'record n ...) n)) ;=> (1 x 2 y)
Vector
could be implemented as pattern syntax
by:
(define-pattern-syntax vector
(syntax-rules ()
((_ subpat ...)
(and (? vector?)
(seq vec ((idx 0 (+ idx 1)))
(>= idx (vector-length vec))
(vector-ref vec idx)
subpat ...)))))
(quasiquote
quasipattern)
Syntax: Quasipattern
has the following grammar:
quasipattern | ::= | identifier |
| | pattern datum | |
| | (sequence
quasipattern …) |
|
| | (sequence
quasipattern … . quasipattern) |
|
| | #(sequence
quasipattern …) |
|
| | (unquote pattern) |
sequence quasipattern | ::= | quasipattern |
| | quasipattern extended ellipsis | |
| | (unquote pattern …) |
|
| | (unquote-splicing pattern …) |
Semantics: Within a
quasipattern
, all
identifier
s match symbols
except outside of unquoted (unquote pattern)
escapes. In addition, list-structured
quasipattern
s match literal
proper or improper lists rather than being interpreted as
pattern syntax
. Within a list-
or vector-structured quasipattern
, unquote-splicing
or a
following ellipsis
can be used
to match multiple items from within the list or vector, analogously
to a sequence pattern
.
(lset pattern …)
(lset
pattern …
rest pattern ellipsis)
An lset
pattern matches a proper list of Scheme
values, testing that items which match the pattern
s appear in any order within its subject.
If the rest pattern
is given,
it must match any remaining values in the list which do not match
the set of pattern
s.
Note: See the
note on performance under seq/unordered
.
Examples:
(match '(1 2 3)
((lset (? even? x)
(? odd? y)
(? odd? z))
(list x y z))) ;=> (2 1 3)
(match '((a . 1) (b . 2) (c . 3))
((lset (cons 'b val) _ ...) val)) ;=> 2
(match '(1 x 2 y)
((lset (? symbol? s) more ...)
(values s more))) ;=> x (1 2 y)
Lset
could be implemented by:
(define-syntax lset (syntax-rules ()))
(define-pattern-syntax lset
(syntax-rules ()
((_ subpat ...)
(and (? list?)
(seq/unordered ls ((more ls (cdr more)))
(null? more)
(car more)
subpat ...)))))
(define-pattern-syntax
identifier transformer
expression)
Syntax: Identifier
must be an identifier which already
has some kind of binding associated with it. Transformer expression
is as for the right hand
side of the base Scheme define-syntax
.
Semantics: Associates the
transformer resulting from evaluating the transformer expression
with the
identifier
. When a
pattern syntax
use whose
keyword
is the given
identifier
appears within a
pattern
, the matcher will
replace that pattern with the result of calling the transformer on
the use of pattern syntax. This process is analogous to the normal
process of Scheme macro expansion.
Note: The
sample implementation only supports transformer procedures as the
right-hand side of define-pattern-syntax
. Since it is
unlikely that anyone will want to use a variable transformer as a
pattern syntax transformer, and all known syntax-object-based
expander implementations return an ordinary transformer procedure
from syntax-rules
expressions (albeit this is not
guaranteed by the R6RS specification, though it is guaranteed by
the current draft of R7RS Large), this is not a problem in practice
for any known expander, but future adaptations of the sample
implementation may need to specially treat transformer types which
are not procedures of the signature Syntax → Syntax.
The behaviour of pattern syntax associations by
define-pattern-syntax
is equivalent to the behaviour
of an identifier property. For example, it is possible to define
pattern syntax on an identifier which is not exported from the
library where it is defined, in which case the pattern syntax will
only be visible inside the library. It is also possible to locally
re-define the pattern syntax associated with imported bindings
within a library, and this will not affect uses of the pattern
syntax in other libraries. Likewise, if pattern syntax is defined
or redefined within a local-binding context, the pattern syntax
will only be visible within that context. The pattern syntax
associated with an identifier is exported and made available under
that identifier when the identifier itself is exported from a
library. It is an error to import an identifier twice with the same
binding from different libraries if they have different pattern
syntax associated with them.
Examples: See the examples under the entries in the section on derived pattern syntax.
In general, pattern syntax specialized for matching instances of
a record type can be expressed as a combination with
and
of a ?
pattern for that record type’s
predicate and apply
patterns for each of that record
type’s field accessor procedures. The following definition of a
two-dimensional point record type will be used as an example.
(define-record-type point
(make-point x y) point?
(x point-x)
(y point-y))
Given this definition, a pattern to match instances of
point
can be expressed directly as follows:
(match my-location
((? point?
(apply point-x my-location-x)
(apply point-y my-location-y))
(show #t
"I am at X coord: " my-location-x ", Y coord:" my-location-y)))
But this can be more concisely expressed by defining pattern syntax to expand into this form of pattern, as follows:
(define-pattern-syntax point
(syntax-rules ()
((_ x-pat y-pat)
(? point?
(apply point-x x-pat)
(apply point-y y-pat)))))
allowing the previous example to be rewritten more concisely and more expressively:
(match my-location
((point my-location-x my-location-y)
(show #t
"I am at X coord: " my-location-x ", Y coord:" my-location-y)))
This new pattern can then of course be composed with other pattern syntax:
(define (point-quadrant pt)
(match pt
((point (? positive?) (? positive?)) 'upper-right)
((point (? positive?) (? negative?)) 'lower-right)
((point (? negative?) (? positive?)) 'upper-left)
((point (? negative?) (? negative?)) 'lower-left)
((or (point (? zero?) _)
(point _ (? zero?)))
(assertion-violation 'point-quadrant "point has a zero coordinate" pt))))
Record types in Scheme are used as part of the public interface of libraries in at least two ways.
The first is to create true records, representing data about
some entity that is relevant to the program’s domain of operation.
In this case I suggest that the record type name (in the sense of
R6RS and R7RS) should be used as the name of the pattern syntax.
The following define-record-type+pattern-syntax
macro
defines both a record type (in R7RS small style) and pattern syntax
for that record type.
(define-syntax define-record-type+pattern-syntax
(syntax-rules ()
((_ name constructor-spec predicate
(field-name accessor . maybe-setter) ...)
(begin
(define-record-type name constructor-spec predicate
(field-name accessor . maybe-setter) ...)
(define-pattern-syntax name
(syntax-rules ()
((_ field-name ...)
(? predicate
(apply accessor field-name) ...))))))))
Example usage:
(define-record-type+pattern-syntax measure
(make-measure magnitude unit) measure?
(magnitude measure-magnitude)
(unit measure-unit))
(define (fff->si m)
(match m
((measure n 'furlong)
(make-measure (* n #e201.168) 'metre))
((measure n 'firkin)
(make-measure (* n #e40.8233133) 'kilogram))
((measure n 'fortnight)
(make-measure (* n 1209600) 'second))))
Further, it may be desirable, at least in some cases, to be able to use a kind of ‘keyword subpattern’ to match the fields of records. The following example extends the above to allow this.
(define-syntax define-record-type+pattern-syntax/keyword
(syntax-rules ::: ()
((_ name constructor-spec predicate
(field-name accessor . maybe-setter) :::)
(begin
(define-record-type name constructor-spec predicate
(field-name accessor . maybe-setter) :::)
(define-pattern-syntax name
(lambda (stx)
(syntax-case stx ()
((_ subpat ...)
#`(? predicate
#,@(map
(lambda (subpat)
(syntax-case subpat (accessor :::)
((accessor field-pat)
#'(apply accessor field-pat)) :::))
#'(subpat ...)))))))))))
Example usage:
(define-record-type+pattern-syntax/keyword journal
(make-journal title abbreviation) journal?
(title journal-title)
(abbreviation journal-abbreviation))
(define journals
(list (make-journal "Journal of Indo-European Studies"
"JIES")
(make-journal "Zeitschrift für vergleichende Sprachforschung"
"KZ")
(make-journal "Notes and Queries"
"NQ")))
(define journal-abbreviations
(match journals
((list (journal (journal-abbreviation abbr)) ...)
abbr)))
;; journal-abbreviations => ("JIES" "KZ" "NQ")
The other main use of record types in Scheme is simply as a
means of creating a disjoint type in order to represent some kind
of composite data structure. Recall the existing Scheme convention
that the name of the data type itself denotes a constructor which
pre-fills a data structure with values passed to the constructor
(as opposed to constructors whose name starts with
make-
, which merely pre-initialize the structure with
a certain number of values). The name of this constructor procedure
should (as in the examples list
, cons
,
vector
, etc.) be associated with pattern syntax which
deconstructs the input using syntax essentially similar to that
accepted by the constructor procedure itself to maximize symmetry
between construction and deconstruction. In some cases it may be
necessary to define multiple types of pattern syntax for one data
type to match against input in different styles (as in the example
of list
and cons
). It is not possible to
give a general example as every data structure’s needs may be
different.
Programmers defining pattern syntax for record types which do not fall cleanly into these categories, or some other form of pattern syntax entirely, can simply devise their own conventions.
In general, evaluation of expressions in pattern syntax (the
procedure expression
s of
?
and apply
patterns as well as calls to
the resulting procedures, and the expressions used in a
seq
pattern) should have well-defined behaviour, and
should never cause an error to be signalled during matching. A
pattern should either match its subject or not. The evaluation of
expressions which may either cause an error to be signalled or
cause undefined behaviour should be guarded by a predicate using a
?
pattern and the left-to-right guarantee of the
and
pattern.
For example, the following would not be a good implementation of
a cons
pattern:
(define-pattern-syntax bad-cons
(syntax-rules ()
((_ car-pat cdr-pat)
(and (apply car car-pat)
(apply cdr cdr-pat)))))
This pattern will misbehave if the pattern matcher attempts to
call it on a subject which is not a pair. If the match
user intends for an error to be signalled on non-pair input, they
should decide this for themselves by allowing the matcher to run
out of patterns or providing an explicit case for subjects which
are not pairs. The above pattern definition makes uses such as the
following impossible, because the pair procedures car
and cdr
will signal an error and cause the whole
match
expression to fail before the cases for the
empty list and other possible inputs are tried.
(match x
((bad-cons a b) 'make-progress)
('() 'end-of-proper-list)
(_ 'improper-list))
Signalling an error during expansion of a pattern if the pattern use does not conform to the syntax of the pattern is of course normal and encouraged.
(match-ellipsis?
obj)
Returns #t
if obj
is a
syntax object representing an extended
ellipsis
as used in a sequence
pattern
; raises a syntax violation if
obj
is a syntax object which uses the
...
identifier incorrectly for use in a
sequence pattern
; or returns
#f
otherwise.
Rationale: Pattern syntax
definitions which build upon seq
and its relatives may
need to pre-process the series of subpattern forms they are given,
for example if their own syntax for subpatterns is something other
than a simple pattern
. In this
case, the syntax definition will need to pre-process the series of
subpatterns it is given to convert its own special syntax into the
syntax of seq
, but in doing so
should generally treat the extended
ellipsis
in the series of input subforms specially,
passing it directly through to seq
.
Such pattern definitions should use match-ellipsis?
to detect the extended
ellipsis
, since the syntax and semantics of
extended ellipsis
may be
extended in the future.
Example: The example of matching a ‘lyst’ of ‘pares’ from above can be expressed as a pattern syntax definition as follows.
(define-pattern-syntax lyst
(lambda (stx)
(syntax-case stx ()
((_ subpat ...)
(with-syntax (((seq-subpat ...)
(map
(lambda (subpat)
(if (match-ellipsis? subpat)
subpat
#`(apply kar #,subpat)))
#'(subpat ...))))
#'(seq* ls ((curr ls (kdr curr)))
(not (pare? curr))
curr
seq-subpat ... '()))))))
&match
(make-match-violation)
(match-violation? obj)
This condition type is a subtype of the standard condition type
&assertion
and has no fields. It could be defined
by
(define-condition-type &match &assertion
make-match-violation match-violation?)
This pattern matcher does not include by default a pattern type which can match records by their field values without defining new pattern syntax for their record type. This feature is not portable to systems without a record procedural/inspection layer, and may have less than ideal run-time performance in any case. Furthermore, the use of such pattern types can be regarded as an anti-pattern, because they inherently make the field structure of records, which ought to be regarded as an implementation detail of the record type, into part of their public interface (see SRFI 256). It is generally preferable to define specialized pattern syntax for each record type, as demonstrated above.
On Schemes with an R6RS-compatible record system, the
$
pattern of Wright-style matching libraries can be
brought back roughly as follows:
(define-syntax record (syntax-rules ())) ; dummy binding to hold only
; pattern syntax
(define-pattern-syntax record
(lambda (stx)
(syntax-case stx ()
((_ type field-pat ...)
(with-syntax ((rtd #'(record-type-descriptor type))
((field-n ...) (iota (length #'(field-pat ...)))))
#'(? (record-predicate rtd)
(apply (record-accessor rtd field-n) field-pat) ...))))))
Example usage:
(define-record-type point (fields x y))
(define (distance-from-origin pt)
(match pt
((record point x-pos y-pos)
(sqrt (+ (* x-pos x-pos)
(* y-pos y-pos))))))
(distance-from-origin (make-point 3 4)) ;=> 5
(distance-from-origin (make-point 4 5)) ;=> 6.4031242...
Aside from the general design issue noted above, the run-time
efficiency of this particular pattern syntax is not as good as if
the corresponding type had been given its own pattern syntax
directly, as e.g. in the
define-record-type+pattern-syntax
example above,
because of the need to invoke record type inspection procedures at
run time. A means of expand-time record type inspection, such as
that proposed in SRFI 136,
could reduce or eliminate this overhead.
The record
pattern example also only allows
matching fields from one layer of a type hierarchy, following the
design of the R6RS record system. Fields from multiple layers have
to be extracted by addressing each layer specifically:
(define-record-type point-3d (parent point) (fields z))
(define (point-3d-translate pt dx dy dz)
(match pt
((and (record point x y)
(record point-3d z))
(make-point-3d (+ x dx)
(+ y dy)
(+ z dz)))))
This somewhat reduces the exposure of the field structure of
each individual level of the type hierarchy, as does the ability to
omit field patterns from the right of the record
pattern (so the implementer of record types can still safely add
new fields at the right). Still, it is better in general to
separate the definition of a record type from the definition of its
matching interface.
As noted in the entry on
match
, a pattern matcher designed for a
strictly evaluated language with strict data structures (such as
this one) can, in many contexts, safely use any order for testing
patterns and subpatterns in order to most efficiently rule out
non-matching clauses and select the appropriate matching clause.
However, Scheme’s first-class procedures and macro system also
provide the means to implement lazy data structures such as the
streams of SRFI 41.
For data structures like these, it is preferable to require
left-to-right pattern matching so that the pattern matcher will not
look ‘too far ahead’ in a sequence to a point where evaluation will
diverge or signal an error.
Achieving this for patterns matching individual instances is
simple, though: pattern syntax for a lazy record or data structure
must only use the and
pattern with its guarantee of
left-to-right matching; avoid the subpattern functionality of the
?
primitive which does not make any such guarantee; be
careful about its use of multiple subpatterns of the
apply
primitive; and ensure all values are forced as
soon as they are accessed by apply
patterns. The
functionality of the stream-match
form of SRFI 41
can be simulated with pattern syntax for the
stream-cons
, stream-null
, and
stream
constructors, plus (for convenience) a new
stream-cons*
constructor, as follows:
(define-pattern-syntax stream-cons
(syntax-rules ()
((_ car-pat cdr-pat)
(and (? stream-pair?)
(apply stream-car car-pat)
(apply stream-cdr cdr-pat)))))
(define-pattern-syntax stream-null
(syntax-rules ()
((_) (? stream-null?))))
(define-syntax stream-cons*
(syntax-rules ()
((_ elt) elt)
((_ elt more ...)
(stream-cons elt (stream-cons* more ...)))))
(define-pattern-syntax stream-cons*
(syntax-rules ()
((_ pat) pat)
((_ pat more ...)
(stream-cons pat (stream-cons* more ...)))))
(define-pattern-syntax stream
(syntax-rules ()
((_ pat ...)
(stream-cons* pat ... (stream-null)))))
A disadvantage is that the deconstructor for the empty stream
must be spelled (stream-null)
, while the constructor
is spelled stream-null
. This could be fixed by a
corresponding change in the syntax of SRFI 41, or simply by
making the empty stream object the same as the standard Scheme
empty list object and eliminating the stream-null
constructor and deconstructor entirely.
(define (len strm)
(match strm
((stream-null) 0)
((stream-cons head tail) (+ 1 (len tail)))))
(define-stream (stream-merge lt? . strms)
(define-stream (merge xs ys)
(match-values (values xs ys)
(((stream-null) ys) ys)
((xs (stream-null)) xs)
(((stream-cons x xs*) (stream-cons y ys*))
(if (lt? y x)
(stream-cons y (merge xs ys*))
(stream-cons x (merge xs* ys))))))
(stream-let loop ((strms strms))
(match strms
('() stream-null)
((list strm) strm)
((cons strm strms)
(merge strm (loop strms))))))
Because the ellipsis form of the seq*
pattern is
greedy, it cannot safely be used with streams or other
potentially infinite data structures without risking divergence. If
this is not a concern, the following alternative definition of the
stream-cons*
pattern syntax may also be used.
(define-pattern-syntax stream-cons*
(syntax-rules ()
((_ pat) pat)
((_ pat ... tail-pat)
(seq* strm ((curr strm (stream-cdr strm)))
(not (stream-pair? strm))
curr
(apply stream-car pat) ... tail-pat))))
What cannot be safely accommodated with this pattern matcher is
situations with match-values
or
match-lambda
where one input value holds information
which, if examined earlier than some other input value, might
prevent the erroneous or divergent forcing of some part of the
other input value. The above example of match-values
in the merge procedure is safe because it must examine both values
anyway, but in general these forms must be used with particular
care in combination with lazy data.
The following example of a convenience form for defining new
pattern syntax is based on a design originally by Richard Cobbe and
found in the test suite of Racket’s pattern matcher. It provides a
concise notation for a ?
test pattern over a series of
apply
patterns.
(define-syntax define-view
(lambda (stx)
(syntax-case stx ()
((_ name test (selector ...))
(with-syntax (((selector-id ...)
(generate-temporaries #'(selector ...))))
#'(begin
(define-syntax name (syntax-rules ()))
(define-pattern-syntax name
(syntax-rules ()
((_ selector-id ...)
(? test
(apply selector selector-id) ...))))))))))
The following examples are adapted from those of Wadler (1987b).
;; 2 Viewing an integer as zero or a successor
(define (sub1 n) (- n 1))
(define-view zero zero? ())
(define-view succ integer? (sub1))
(define power
(match-lambda
((x (zero)) 1)
((x (succ n)) (* x (power x n)))))
(define fib
(match-lambda
(((zero)) 0)
(((succ (zero))) 1)
(((succ (succ n))) (+ (fib n) (fib (+ n 1))))))
;; 3 Another view of integers
(define (div2 n) (/ n 2))
(define-view even even? (div2))
(define-view odd odd? ((compose sub1 div2)))
(define power
(match-lambda
((x (zero)) 1)
((x (even n)) (power (* x x) n))
((x (odd n)) (* x (power (* x x) n)))))
;; 4 Viewing a complex number in cartesian and polar coordinates
(define-view cart complex? (real-part imag-part))
(define-view pole complex? (magnitude angle))
(define add
(match-lambda
(((cart x y) (cart x* y*))
(make-rectangular (+ x x*) (+ y y*)))))
(define mult
(match-lambda
(((pole r θ) (pole r* θ*))
(make-polar (* r r*) (+ θ θ*)))))
This pattern matcher also does not include by default any
equivalent of the ***
tree pattern or ‘two-dimensional
ellipsis’ which originated as one of Alex Shinn’s extensions to
Wright’s pattern syntax. The following example shows how a custom
pattern can provide this functionality. Note that this specific
matcher is exactly as limited as the original ***
matcher, i.e. it can search only trees based on proper lists
and will never search for the pattern on its right hand side within
the first element of each sublist.
(define-syntax tree (syntax-rules ()))
(define *not-found* (cons '*not-found* '()))
(define (not-found) (values *not-found* #f))
(define (found? x) (not (eq? x *not-found*)))
(define-pattern-syntax tree
(syntax-rules ()
((_ path-pat item-pat)
(apply (lambda (initial-val)
(let match-tree ((path '())
(val initial-val))
(match val
(item-pat
(let ((path (reverse path)))
(match path
(path-pat
(values path val))
(_ (not-found)))))
((cons name subvals)
(let loop ((more subvals))
(match more
((cons subval more)
(call-with-values
(lambda ()
(match-tree (cons name path) subval))
(lambda (a b)
(if (found? a)
(values a b)
(loop more)))))
('() (not-found)))))
(_ (not-found)))))
(and (? found?) path-pat)
item-pat))))
Potential uses of such a pattern include as an alternative to Oleg Kiselyov’s SXPath in some cases:
(match '(a (b c) (b d) (c c)) ((tree x 'd) x)) ; equivalent to XPath
; //node()[.='d']
;=> (a b)
(match '(a (b c) (b d) (c c)) ((tree (and x (list _ ... 'c)) 'c) x))
; equivalent to XPath
; //c[.='c']
;=> (a c)
Note, though, that this pattern has poor performance
characteristics, similar to a naïve string search algorithm. (It is
no worse than Shinn’s implementation in this regard, however, which
uses the exact same algorithm.) A small (non-asymptotic)
performance gain for certain subpatterns would be to use
not
to suppress variable bindings the first time the
matcher tries to see if a tree element matches the target pattern.
A pattern matcher with ideal performance on this kind of pattern
would use a completely new approach incompatible with the design
decisions made here to optimize for ‘one-dimensional’ patterns.
The question of whether to use constructor-style or datum-style
syntax for deconstruction arose during the pre-R4RS discussion of
pattern matching. Chris Haynes
presented an early version of his own pattern matching library,
which used datum syntax for pairs and lists, but constructor syntax
for vectors. Jonathan Rees
pointed out the inconsistency and proposed a resolution, noting
that a consistent constructor-style syntax would allow the pattern
matching syntax to be extended cleanly. He proposed deconstructors
as an extra procedure property on constructors, similar to how the
T setter
functionality worked (later specified by
SRFI 17).
In a subsequent thread covering several questions inherent in
the design of the would-be R4RS pattern matching library, Will
Clinger
noted the problems of both approaches. Namely, datum-style
deconstructor syntax is, in most cases (including in Wright-style
syntax), not an exact mirror of Scheme’s lexical syntax when
matching of literal symbols is needed, because patterns such as
('x 1 y)
do not correspond to the datum syntax of an
actual matching list such as (x 1 "why")
; it would, in
any case, be a new sublanguage. (Syntax-rules
, of
course, later resolved this with its separate list of literal
identifiers.) The constructor syntax, meanwhile, has a closer
resemblance between pattern matching input (under evaluation,
rather than under read
) and individual patterns, but
seemed to pose problems in interaction with the scoping of
identifiers in Scheme code outside of match clauses: what, for
example, is the meaning of list
when used as a
deconstructor within a block such as (let ((list vector))
...)
?
Under Rees’s proposal for dynamic extensibility, using special
properties on procedure objects, the answer is in fact quite clear:
in that example, list
will refer lexically to a
variable which, at run time, contains the same procedure –
including the same deconstructor – as the ordinary Scheme
vector
procedure: therefore that procedure will both
(in regular Scheme code) construct, and (on the left-hand side of
match clauses) deconstruct vectors, rather than lists. This seems
intuitive. A similar approach, using properties of OO classes, is
the means of extension for the match
statement in
Python 3.10 and above; also, at time of writing, a variant of this
idea using properties of JavaScript functions is part of the
ECMAScript TC39
pattern matching proposal.
On the other hand, research into efficient compilation of
pattern matching has focussed on cases where the semantics of
patterns are completely knowable during ahead-of-time compilation,
which is not always the case when the semantics depend on run-time
procedure calls. If the ECMAScript proposal is adopted, it is
likely that it will lead to more research into techniques for
efficient compilation of patterns when this method of extension is
used, perhaps using JIT-like adaptive optimization techniques, but
this has not happened yet. Moreover, Scheme macros, which are
designed to support full ahead-of-time compilation, may not be an
appropriate tool for implementing this kind of optimization.
Lastly, it is not clear how syntactic operators such as
quote
can be used as deconstructors orthogonally to
how procedures are used. Using macros as the means of pattern
extension makes it possible to use optimization techniques that are
already well understood, to implement pattern matching in an
ordinary Scheme macro, and (potentially) to use any Scheme
identifier to signify a deconstructor regardless of the type of its
binding in ordinary code; but it also implies the problem of
conflicting namespaces which Clinger was concerned about.
Queinnec’s original proposal for macro-extensible patterns
seemingly resolved this issue by establishing a new naming
convention for identifiers: his patterns consistently begin with an
*
character. Thus, the consistency and extensibility
advantages of ‘constructor-style’ patterns can be retained, but the
use of different identifiers for deconstructors means
deconstruction of values does not exactly mirror construction.
(This might be considered a pro or con, depending on your view of
whether it’s a good thing for identifiers to have context-sensitive
semantics.) SRFI 257
revives this idea, placing transformers for deconstructors within
Scheme’s single namespace, with a convention of prefixing them with
~
. Under this proposal, list
and
vector
, as ordinary Scheme variables and not pattern
transformers, are simply not allowed as the names of deconstructors
on the left-hand side of a pattern matching clause. The
quote
family of operators is an exception to this
naming convention, presumably because of the ergonomic advantage of
being able to use the lexical abbreviations for them.
Identifier properties, used by this SRFI, offer another
approach. In this view, the context-specific semantics of a
constructor identifier have to do with its statically knowable
binding as an identifier, and not with its actual value as a
variable which can only be known at run time. The answer to
Clinger’s question is clear: in (let ((list vector))
...)
, the binding of list
has been shadowed;
because pattern syntax belongs to bindings and not to values, new
pattern syntax would have to be defined for the list
identifier within the let
block, otherwise an attempt
to use list
as the name of a deconstructor within that
block would be a syntax violation. This mirrors the semantics of
auxiliary syntax keywords in ordinary, non-extensible Scheme
macros; the set of auxiliary syntax keywords within match clauses
is simply arbitrarily extensible.
Identifier properties, in combination with macros as the means of extensibility, thus offer the benefits of deconstructor syntax matching constructor syntax, an extensible set of deconstructors, and efficient ahead-of-time compilation within an implementation as an ordinary Scheme macro requiring no special support from the underlying compiler. While in perverse cases such as Clinger’s example it is possible that the results of shadowing a deconstructor’s name may be seen as unintuitive, the semantics in fact match those already familiar to Scheme programmers from the use of (auxiliary) syntax keywords in general; in the majority of cases of shadowing, both the Rees-style dynamic extensibility and identifier property-based extensibility behave identically, respecting Scheme’s lexical scoping and its single namespace for all identifiers in all contexts.
One final approach should be considered, which is that taken by
Racket’s match
library. In this view, ‘match
expanders’ are ordinary syntax keywords within Scheme’s single
namespace, defined using a special variant of
define-syntax
. By default, the keyword for a custom
match expander may only be used within the left-hand side of a
matching clause, but an optional second transformer may be given
which works as a normal Scheme macro transformer when the
identifier is used outside of a pattern. This allows the same
identifier to be used in both patterns and in normal Scheme code,
but has the disadvantage that an identifier can only be used in
these two specific contexts and can’t be extended by additional
properties which might be used by other embedded sublanguages;
whereas identifier properties intrinsically allow any number of
macros to define and use their own extra information about any
identifier.
Non-linear patterns were first added to a Wright-style pattern matcher in Bruce Hauman’s PLT Scheme implementation, then copied by Alex Shinn and Sam Tobin-Hochstadt into their respective implementations. After thorough consideration, I have decided that support for such patterns, while perhaps occasionally useful, is a misfeature in the context of this kind of pattern matcher, because the composition behaviour of non-linear uses of variables with some other features offered by this pattern matcher is poor.
Most of the problems ultimately arise, in essence, from the root
issue of deciding which instance of the variable in the pattern
should be considered to bind the single comparand against which
other values which (potentially) match the same pattern variable
can be tested. The intuitive choice is probably the leftmost
instance, but this may in the general case put unenforceable and
equally opaque restrictions on the expansions of pattern
syntax – another feature I find considerably more valuable
than general support for non-linear patterns. In other words,
pattern syntax such as (foo a b)
may wish to expand
into a form such as (and (baz b) (bar a))
for
performance reasons (note the inversion of variables in the
expansion compared to the original form), but this expansion would
cause the leftmost instances of a non-linear variable within
b
to be considered the leftmost overall, rather than
the leftmost instances within a
and therefore within
the original source.
While this difference may appear to be abstract in the sense
that all bound values are ‘the same’ anyway, concrete issues arise
in composition with, for example, not
patterns. As of
writing, the Racket documentation attempts to explain the behaviour
by saying that ‘instances of an id
[i.e. a
variable] in different or
and not
sub-patterns are independent’, but this is not entirely correct and
depends on whether the id
has appeared outside of a
not
or or
pattern ‘before’ its appearance
within one. A pattern such as (list a (not a) (not a))
will match a three-item list and require that the second two
elements are distinct from the first (but may be the same as one
another); a pattern such as (cons a (not a))
will
match a pair whose car and cdr are distinct. This, at least, seems
intuitively correct. On the other hand, though, a pattern such as
(cons (not a) a)
will never match, because the first
a
(of (not a)
) is considered to
be independent from the second, meaning that (not a)
is effectively (not _)
, causing all matches to fail.
To reiterate, if the (cons a b)
were for some reason
expanded to (and (? pair?) (apply cdr b) (apply car
a))
(note again the inversion of a
and
b
), the inverse would apply, and (cons (not a)
a)
would function as probably expected. The interaction of
pattern syntax with non-linear variables is therefore dependent on
implementation details of each form of pattern syntax. This is not
transparent to the user of pattern syntax.
An informal rule imposed on pattern syntax definitions that
subpatterns should appear in the same order in the expansion as in
the input form may seem sufficient to cure this problem, if one is
willing to accept that the rule cannot actually be enforced by the
pattern expander in any way. Another, more robust solution might be
to require occurrences of variables within or
and
not
clauses to be truly independent of one another and
of variables outside of or
and not
, so
that (cons a (not a))
would not match any value, or
would simply be disallowed; the pattern would have to be expressed
e.g. (and (cons a b) (not (cons x x)))
.
It is also possible that choosing a different set of primitives to those used in this pattern matcher could help alleviate this issue. This should be investigated further.
There is another problem, though: non-linear uses of variables
in patterns also compose poorly with ellipsized sequence
subpatterns. Here the problem of choosing which instance to use to
bind the primary comparand compounds with the problem that a
variable appearing within an ellipsized subpattern has (at least)
two semantics. Within the ellipsized subpattern, the variable
conceptually refers to one value within a sequence of related
values; within the body of a matching clause, it refers to a list
of all values in that sequence.5 For the purposes of
choosing a comparand for an instance of the variable within the
larger pattern but outside of the ellipsized subpattern, both
Racket’s and Shinn’s match
choose the latter
interpretation, but differ in which occurrence of the variable is
used to find the value used as a comparand. Shinn’s implementation
appears to consistently pick the leftmost occurrence, meaning that
patterns such as (the equivalent of) (list x x ...)
match a sequence of identical items of which there must be at least
one, and x
is bound to the first item in the list.
Racket, on the other hand, picks the ellipsized instance no matter
where it appears in the list, so that (list x x ...)
matches lists of the form ((a ...) a ...)
where there
are the same number of equal a
s in both the sublist
and the outer list. The behaviour chosen by Shinn may initially
appear more intuitive here, until the patterns are reversed: with a
pattern equivalent to (list x ... x)
, it will require
a list of the form (a ... (a ...))
(with the same
conditions on the equality and number of a
s). Racket’s
version’s behaviour on this pattern is the same as on
Shinn’s, but it issues the warning ‘non-linear pattern used in
match
’.
Note: I have not tested the behaviour of both pattern matchers extensively to determine if the rules listed above, based on observation of a relatively limited number of test cases, hold for all non-linear patterns in both versions.
The right thing may be for all non-linear occurrences of an
ellipsized pattern variable to have the opposite meaning as in
Racket: they should refer to one item in the sequence which
matched, rather than the entire sequence. Even this does not fully
solve the problem, if for no other reason than that it makes it
non-trivial to elegantly express a pattern such as (list
(complex-subpattern x) ...)
with the additional constraint
that all the x
s must be the same. Further, the
performance impact of non-linear patterns in combination with the
repetition operators in sequence subpatterns may be significant.
The full semantic and performance implications of such a change
should be investigated first.
In summary, while it may be possible to find solutions to the
problems of composability of non-linear patterns with other
features of the pattern matcher, the solutions would necessarily
involve significant innovations compared to the semantics of
non-linear patterns in existing Scheme pattern matchers. Finding
the ‘correct’ solutions to these issues, and verifying their
correctness, is outside of the scope of the current version of this
pattern matcher. This version therefore simply does not allow
multiple occurrences of the same variable within a pattern, as with
syntax-rules
and syntax-case
.
Those interested in more discussion of the semantic issues of non-linear patterns may find this mailing list thread about the reasons Haskell does not support them relevant.
The apply
pattern syntax does not offer direct
support for checking the number of return values from the procedure
it calls: it is undefined behaviour if the number of values
returned is not the same as the number of subpatterns. Ideally, the
behaviour would be well-defined in this case, namely that the
apply
pattern should not match if the procedure
returns the wrong number of values. This would make it easier to
use varying numbers of return values as a cheap kind of sum type,
especially as an option type.
Unfortunately, this would result in a small ‘peanut butter’-type
performance overhead for the vast majority of uses of
apply
patterns with procedures which always return the
same number of values: all of these uses would need to compile to
include an explicit check on the number of return values, not only
the uses where alternative clauses check for a different number of
values. Scheme implementations are not very good at optimizing away
such checks (primitively expressed as call-with-values
over case-lambda
): as of writing, only Chez Scheme
version 10 is known to be able to do this optimization, and only
for procedures of one return value. It may also make it harder to
keep compile times for optimizing implementations reasonable, since
it would increase the number of primitive pattern types which are
directly refutable on their own terms (and not only refutable on
the basis of some subpattern).
The workaround is to use a procedure argument to the
apply
pattern which converts multiple values into a
single container (such as a multiple-value box from SRFI 195)
and write a pattern checking its contents.
If Scheme implementations’ ability to optimize this case for
procedures of known co-arity improves significantly, then a future
revision of this SRFI may change the semantics of
apply
patterns to the correct ones.
Disjointed variable is the name I have given to any variable
which occurs in some subpattern(s) of an or
pattern
(also formally called a disjunctive pattern, whence the name), but
not in all of the subpatterns.
Here also, the behaviours of the existing pattern matchers
differ. Wright’s original implementation and the Racket pattern
matcher signal an error during expansion upon encountering
or
patterns containing any disjointed variables.
Shinn’s implementation binds the disjointed variables to an
unspecified value when the matching subpattern does not contain
them. Hauman’s implementation causes an undefined identifier error
to be signalled when a disjointed variable’s value is accessed and
the matching subpattern does not contain it. Nelson’s
implementation behaves quite strangely and I cannot easily tell
what it is doing.
This pattern matcher takes a slightly new approach: it is not a
syntax error to merely use an or
pattern with
disjointed variables, only to subsequently attempt to (statically)
refer to a disjointed variable in the code for the matching clause.
In practice, this is almost indistinguishable from the behaviour of
Wright’s original and the Racket pattern matchers, but with one
difference which is important in the context of macro-extensible
pattern matching: it allows non-hygienic patterns to be used in the
definition of hygienic patterns without ruining the ability of the
hygienic patterns to be used as subpatterns of an or
pattern.
Consider, for example, a pattern regexp
which tests
an input string against a regular expression and, as a convenience,
unhygienically binds pattern variables $1
,
$2
, $3
etc. to the substrings which
matched subgroups in the regular expression. A higher-level pattern
correct-looking-string
may wish to build on
regexp
to create a pattern matching strings having a
certain form, but its use of regexp
is purely an
implementation detail that should not be exposed. Fortunately,
syntax object-based expanders already handle this case correctly,
and the $1
, $2
, $3
variables
will be created with marks corresponding to the use of
regexp
within correct-looking-string
, not
the use of correct-looking-string
directly. However,
with the Racket/Wright semantics, any attempt to use
correct-looking-string
within only one clause of an
or
pattern will cause an immediate syntax error
because of the $1
, $2
, $3
variables which, due to their marks, are not even accessible within
the matching clause code. Binding these disjointed variables
instead to syntax transformers which always signal a syntax
violation addresses the real problem with disjointed
variables with no run time overhead.
An alternative would be an approach more similar to Hauman’s,
also comparable to the approach used by the letrec
and
letrec*
forms in case of violations of their
respective restrictions on variable value access and assignment. In
this approach, disjointed variables would instead be bound to
syntax transformers which expand into code signalling a run-time
error if the matching subpattern did not bind the corresponding
variable, and returning the variable’s value otherwise. I think
this would only encourage bad programming styles, though – in
particular, where checks already performed by the matcher are
repeated within the matching code in order to determine if some
variable can be safely accessed – and people should instead
use different matching clauses.
The pattern matcher includes no facilities for ensuring exhaustiveness – that is, statically (at expand time) checking that a set of patterns covers all possible inputs in a particular context. Given Scheme’s dynamic typing, full checking of exhaustiveness is not possible in any case. I considered how to provide such checking for more limited cases, such as providing forms that would ensure that all record types within a given family of related types are handled. Eventually, I rejected this as out of scope for this library. Tobin-Hochstadt (2011) also contains a brief discussion of the problems of exhaustiveness checking in a dynamically typed pattern matcher.
Users interested in pattern matching within a closed world of potential input values may be interested to look at the Nanopass compiler framework (Sarkar et al. 2005, Sarkar 2008, Keep 2012) for guidance and inspiration. Of course, the extensible pattern matcher presented here may be a good back-end target for a pattern matching library with this feature, since it already implements the optimizations needed for good run-time performance. A pattern matcher with coverage checking may also be able to make good use of custom pattern syntax within its implementation.
The sample implementation is provided by the author’s extensible-match
library. Version (- 1 (expt 2 (- n)))
is the sample
implementation corresponding to draft n
of the SRFI;
when the SRFI is finalized, version 1 will be permanently archived
in the SRFI repository. The implementation should run on any R6RS
implementation with SRFIs 1 (list library), 133 (vectors library),
151 (bitwise operations), 213 (identifier properties), and 244
(define-values
), with the additional caveat (as
mentioned under the entry for define-pattern-syntax
)
that syntax-rules
expressions must return an ordinary
transformer procedure and not any implementation-specific kind of
transformer.6 At time of writing, this amounts
only to Chez Scheme with Aaron Hsu’s chez-srfi
package, but I hope to submit a patch to Guile for identifier
property support soon, and Andrew Whatson’s work on porting
Unsyntax (which supports SRFI 213, but does not support much of
R6RS nor the other needed SRFIs) to other Scheme implementations
may also make it available there soon.
The sample implementation is designed to be well-optimizing. On
implementations which support the guaranteed-optimization clause of
the macro-writer’s bill of rights (Dybvig 2004), the code it produces for any given
match
expression should generally be as efficient as
the equivalent hand-written conditionals and bindings.
I would be interested in an alternative, mildly optimizing
implementation in terms of SRFI 148
which made the means of pattern syntax transformer lookup and
application pluggable. This would allow Scheme implementations
which do not natively provide syntax-case
and whose
maintainers don’t wish to provide full identifier property support
to take advantage of macro-extensible pattern matching.
This section of the SRFI gives hints that might be useful to anyone coming up with a re-implementation of this pattern matcher.
The implementation of pattern expansion is quite subtle. Each
individual expansion of pattern syntax must obey the hygiene
condition. In order to achieve this portably, the sample
implementation actually returns to the Scheme expander after each
step of its expansion, so the expander can apply its
marks/colours/timestamps as needed. An implementation targeting a
specific Scheme implementation may be able to do better if it can
access the internals of the expander to actually do identifier
renaming itself, as in e.g. Racket’s
syntax-local-apply-transformer
. As Ballantyne et
al. (2020) point out, a system
of primitives such as this would be useful for extensible embedded
languages in general, so a future SRFI should provide a way for a
transformer to re-enter the expander on part of its input without
using CPS-style expansions. A mechanism such as Racket’s
make-syntax-introducer
7
could be a useful primitive for this, which would also provide an
elegant solution to the problems addressed by SRFIs 258 and
260.
The test suite for the sample implementation tests publically
exposed behaviour only. It is thus suitable for testing that your
implementation is SRFI-conformant – although the SRFI
specification text is normative, and errors and omissions in the
test suite are possible. The ordering of tests in the test suite is
such that fixing earlier tests is likely to fix numerous later
tests which implicitly depend on the same behaviour as well. See
also the note on the test for
match-letrec
under that entry.
For implementing seq
and seq*
patterns, in the general case, you will need to implement a finite
automaton simulation, either using backtracking, an NFA simulation
keeping track of multiple states, or a DFA transformation. For
simple seq
patterns you can avoid this. A performant
implementation should probably have multiple strategies on hand
depending on the structure of the seq
subpatterns.
There are a number of possible strategies for implementing
seq/unordered
patterns. The sample implementation uses
a backtracking approach, which offers near-linear performance for
the majority of patterns. Another option is Brzozowski
derivatives,8 but a naïve implementation will
require exponentially growing time and space for growing numbers of
irrefutable subpatterns, or subpatterns which are ambiguous with
respect to one another. One of the tests in the test suite is
deliberately designed to take a long time (albeit without failing)
on implementations which fail to optimize for the case of a large
number of irrefutable subpatterns. Unfortunately, this test case is
inherently racing against Moore’s law, so if you are using the test
suite 20 years from now, it’s possible it will no longer appear
quite so slow even if your implementation has bad asymptotic
performance.
It is also moderately tricky to statically enforce the ban on
non-linear patterns (which implementations should do, unless they
have solved the problems mentioned in the design notes and have
good semantics for supporting them as a local extension to this
specification). An attempt to use a pattern variable non-linearly
might be local to one subpattern of or
, or appear
within a not
pattern where binding is suppressed, but
the attempt should nonetheless not be allowed.
I stand on the shoulders of the designers and implementers of
many previous Scheme pattern matching forms. Thanks to all of them.
We all collectively stand on the shoulders of the pioneers of
pattern matching in Lisp and in other languages, whose
contributions are described in the rationale and listed in the
bibliography. Thanks in particular to Sam Tobin-Hochstadt, the
author of Racket’s match
, and Ryan Culpepper, the
author of Racket’s syntax-parse
, and the rest of the
Racket community both for the direct inspiration, and for answering
my questions about the history, design, and implementation of
pattern matching in Racket.
Thanks also to all those who insisted that R7RS Large should include some kind of pattern matcher, which inspired me to work on a solution which is (in my opinion) superior to the popular matcher of Duba, Wright, and Shinn.
Thanks to Stefan Ljungstrand for discussion of possible features, and to him and also Wolfgang Corcoran-Mathe for working through the issues with non-linear patterns together with me.
Andrew Whatson pointed me to Frederick McBride’s
Ph.D. dissertation, and to T’s destructure
special form. These were both very useful pointers to help
establish the early history of pattern matching in programming
languages, and in Lisp in particular.
Marc Nieper-Wißkirchen’s
implementation of R7RS syntax-rules
in R6RS
allowed me to review the post-optimization expansion of Alex
Shinn’s match
implementation in Chez Scheme, which
helped me identify optimizations to make in my own
implementation.
Jeronimo Pellegrini found an error in an example.
The term ‘subject’ was adopted based on a suggestion made by Ron Buckton in the context of the ECMAScript TC39 pattern matching proposal.
Ballantyne, Michael, King, Alexis, and Felleisen, Matthias (2020) ‘Macros for Domain-Specific Languages’ in Proceedings of the ACM on Programming Languages 4 (OOPSLA): 229
Burstall, R. M. (1969) ‘Proving properties of programs by structural induction’ in The Computer Journal 12 (1): 41–48
Burstall, R. M., MacQueen, D. B., and Sannella, D. T. (1980) ‘Hope: An Experimental Applicative Language’ in LFP ’80: Proceedings of the 1980 ACM Conference on LISP and Functional Programming 136–43
Dybvig, R. Kent (2004) ‘The Guaranteed Optimization Clause of the Macro-Writer’s Bill of Rights’ at Daniel P. Friedman: A Celebration (recorded conference paper, 3–4 December; slides)
Hauman, Bruce (c2004) ‘Pattern Matching in Scheme’ (Internet Archive WayBack Machine, 26 June 2007)
Hudak, Paul, Hughes, John, Peyton Jones, Simon, and Wadler, Philip (2007) ‘A History of Haskell: Being Lazy With Class’ in HOPL III: Proceedings of the Third ACM SIGPLAN Conference on History of Programming Languages 12-1 – 12-55 (post-print on the website of Simon Peyton Jones)
Keep, Andrew W. (2012) ‘A Nanopass Framework For Commercial Compiler Development’ (Ph.D. dissertation, Indiana University)
Landin, P. J. (1966) ‘The Next 700 Programming Languages’ in Communications of the ACM 9 (3): 157–66
McBride, Frederick Valentine (1970) ‘Computer Aided Manipulation of Symbols’ (Ph.D. dissertation, Queen’s University of Belfast)
McCarthy, John and Painter, James (1967) ‘Correctness of a compiler for arithmetic expressions’ in Proceedings of Symposia in Applied Mathematics 19 (Mathematical Aspects of Computer Science): 33–41 (post-print on the website of the author)
MacQueen, David, Harper, Robert, and Reppy, John (2020) ‘The History of Standard ML’ in Proceedings of the ACM on Programming Languages (History of Programming Languages 4)
Moses, Joel (1967) ‘Symbolic Integration’ (Ph.D. dissertation, Massachusetts Institute of Technology)
Monnier, Stefan and Sperber, Michael (2020) ‘Evolution of Emacs Lisp’ in Proceedings of the ACM on Programming Languages (History of Programming Languages 4)
Petrofsky, Al (2001) ‘How to write seemingly unhygienic macros using syntax-rules’ in comp.lang.scheme (Usenet posting, 19 November)
Pitman, Kent M. and Moon, David (1989) ‘DESTRUCTURING-BIND’ (memorandum to ANSI committee X3J13)
Queinnec, Christian (1990a) ‘Compilation of non-linear, second order patterns on S-expressions’ in Lecture Notes in Computer Science 456 (PILIP 90): 340–57 (post-print on the website of the author)
Queinnec, Christian (1990b) Le Filtrage : une application de (et pour) Lisp. Paris: InterÉditions (ebook on the website of the author)
Sarkar, Dipanwita (2008) ‘Nanopass Compiler Instrastructure’ (Ph.D. dissertation, Indiana University)
Sarkar, Dipanwita, Waddell, Oscar, and Dybvig, R. Kent (2005) ‘Educational Pearl: A Nanopass Framework for Compiler Education’ in Journal of Functional Programming 15 (5): 653–67
Shinn, Alex (2006) ‘portable hygienic pattern matching’ in comp.lang.scheme (Usenet posting, 29 November)
Steele, Guy L., Jr., White, Jon L., Cannon, Howard I. and Kerns, Bob (1974–82) ‘LISP.NEWS’ (computer file)
Steele, Guy L., Jr. (1984) Common LISP: The Language (1st edition) Bedford, Mass., USA: Digital Press
Tobin-Hochstadt, Sam (2011) ‘Extensible Pattern Matching in an Extensible Language’ (pre-print)
Wadler, Philip (1987a) ‘A critique of Abelson and Sussman –or– Why calculating is better than scheming’ in ACM SIGPLAN Notices 22 (3): 83–94 (post-print on the website of the author)
Wadler, Philip (1987b) ‘Views: A way for pattern matching to cohabit with data abstraction’ in POPL ’87: Proceedings of the 14th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages 307–18 (revised version on the website of the author)
Wright, Andrew K. (1994) ‘Pattern Matching Syntactic Extensions for Scheme’ version 1.07 for Chez Scheme (software release)
Wright, Andrew K. (1996) ‘Pattern Matching for Scheme’ (memorandum, Rice University)
Wright, Andrew K. and Cartwright, Robert (1997) ‘A Practical Soft Type System for Scheme’ in ACM Transactions on Programming Languages and Systems 19 (1): 87–152
It is included in the sys/let.t
source for T 2.9
(released 1984); the file has a copyright date of
1983–4. ↩︎
In Tobin-Hochstadt (2011), Matthias Felleisen claims to have been the original author of this system, but there is no historical evidence to support his claim, and some circumstantial evidence against it. ↩︎
The name ‘Wright–Cartwright–Shinn’ originated in SRFI 200, adding the name of Robert Cartwright those of the two people usually credited with the most widespread implementations of this pattern matcher. Cartwright did co-author a paper with Wright (Wright and Cartwright 1997) which contained an elementary description of the pattern matcher as a prelude to explaining the use of a subset of its syntax within a type-inferred implementation of Scheme. There is no evidence that Cartwright was involved in the design or implementation of the widely distributed original version of the matcher (running on standard implementations of Scheme): he was simply a co-author of another paper with Wright on the subject of a program which incorporated a version of it. It should have been called the ‘Duba–Wright–Shinn’ pattern matcher, after the three people whose contributions to the popular implementations of it are verifiable. ↩︎
This version still includes a compatibility layer to support Wright-style patterns, and thus it could be considered yet another independent re-implementation of Wright’s matcher. ↩︎
If one ellipsized pattern appears within another ellipsized pattern, it could also refer to a subsequence at any level of nesting, but this possibility will not be considered here. ↩︎
Due to R6RS’s lack of a bound-identifier-hash
procedure, for optimum expansion times with the sample
implementation, the generate-temporaries
procedure
should always return identifiers with distinct symbolic names, so
that (symbol-hash (syntax->datum id))
is a
reasonably effective hash function for each id
in a
set of generated identifiers. ↩︎
See also the documentation for the previous, mark-based implementation. ↩︎
The use of Brzozowski derivatives to match disordered sequences
was pioneered by Mark Hopkins in his regular expression library,
and by Joe English and James Clark for SGML/XML validation in the
presence of the DTD/RELAX NG interleaving operator. (See ‘How to validate
XML’, ‘The
Design of RELAX NG’, and ‘Notes on implementing
RELAX NG’) However, both DTDs and RELAX NG put restrictions on
the use of the interleaving operator which make it easy to write a
validator which always runs in linear time (essentially reducing to
a simple anagram problem). Since subpatterns in
seq/unordered
can evaluate arbitrary Scheme code, such
a restriction is not possible here. ↩︎
© 2025 Daphne Preston-Kendal
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the ‘Software’), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software.
The software is provided ‘as is’, without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. In no event shall the authors or copyright holders be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the software or the use or other dealings in the software.