by Marc Nieper-Wißkirchen

This SRFI is currently in *draft* status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to `srfi-241@nospamsrfi.schemers.org`

. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.

- Received: 2022-11-09
- 60-day deadline: 2023-01-09
- Draft #1 published: 2022-11-10
- Marc's personal
Git repo for this SRFI for reference while the SRFI is in
*draft*status (preview)

This SRFI describes a simple pattern matcher based on one originally devised by Kent Dybvig, Dan Friedman, and Eric Hilsdale, which has a catamorphism feature to perform recursion automatically.

None at present.

Pattern matching to destructure expressions is a common task in data-oriented programming. Consequently, a number of pattern-matching libraries for Scheme have found more widespread use. Among them are the three mentioned in the (withdrawn) SRFI 200, namely Racket's pattern matcher, the pattern matcher of Andrew Wright and Robert Cartwright with extensions by Alex Shinn, and a pattern matcher that has been described by R. Kent Dybvig for the use with his Chez Scheme implementation.

The last-mentioned matcher matcher, which is appealing
also from a theoretical point of view, is finally the basis of
the matcher described in this SRFI. Contrary to the other two
matchers, it has a simple pattern language. It's strength lies
in its catamorphism feature to perform recursion automatically.
It is called `match`

.

While one use of pattern matching is to bind variables, it can
also be used to recursively destructure a value in the sense of
R^{6}RS's
or SRFI
1's `fold-right`

. In high-brow category
language, such procedures destructuring inductive values are
called *catamorphisms*.

In some sense, `match`

syntactically expresses the
general catamorphism on Scheme datums. For example, the
aforementioned `fold-right`

procedure (restricted to
a single list argument) can be implemented as follows:

(define (fold-right kons knil lis) (match lis [(,x . ,[x*]) (kons x x*)] [() knil]))

The syntax employed by the pattern matcher by Dybvig, Friedman, and Hilsdale, on which the one described here is based, is also the basis of the syntax of The Nanopass Framework.

A `match`

expression evaluates an input expression
producing an input value. This value is then matched against
the patterns of a `match`

expression much as the
input of a `case`

expression is matched against
datums.

The general form of a match expression is

`(match ⟨expr⟩ ⟨clause⟩ …)`

where `⟨expr⟩`

is the input expression to match and
each clause has one of the following two forms:

```
[⟨pattern⟩ (guard
⟨guard expression⟩ …) ⟨body⟩]
```

`[⟨pattern⟩ ⟨body⟩]`

As with `case`

, the input expression is evaluated to
produce the input value and the first clause the input value
matches, if any, is selected. The `body`

of the
selected clause is evaluated, and the values of the last
expression in the body are returned. An input value matches a
clause if it fits the clause's pattern and the
`⟨guard expressions⟩`

, if any, evaluate to a
true value. Patterns may contain symbolic constants, which must
match exactly, and pattern variables, which match any
input. Pattern variables are prefixed by commas; symbolic
constants are not.

Note that in R^{6}RS matching brackets may be used in
place of matching parentheses and vice versa, mostly for
readability. We use brackets in this specification to document
a recommended style. Users of Scheme systems not supporting
brackets (or users who don't like brackets) can replace them
with parentheses throughout.

(match '(a 17 37) [(a ,x) 1] [(b ,x ,y) 2] [(a ,x ,y) 3]) ⟹ 3

The first clause fails to match because there are three items
in the input list, and the pattern has only two. The second
clause fails because `b`

does not match `a`

.

In the output expressions, the values of the pattern variables are bound to the corresponding pieces of input.

(match '(a 17 37) [(a ,x) (- x)] [(b ,x ,y) (+ x y)] [(a ,x ,y) (* x y)]) ⟹ 629

When followed by an ellipsis (`...`

), a pattern
variable represents a sequence of input values.

(match '(a 17 37) [(a ,x* ...) x*]) ⟹ (17 37)

Ellipses can follow a structured pattern containing one or more pattern variables.

(match '(begin (1 5) (2 6) (3 7) (4 8)) [(begin (,x* ,y*) ...) (append x* y*)]) ⟹ (1 2 3 4 5 6 7 8)

Ellipses can be nested, producing sequences of sequences of values.

(match '((a b c d) (e f g) (h i) (j)) [((,x* ,y** ...) ...) (list x* y**)]) ⟹ ((a e h j) ((b c d) (f g) (i) ()))

Recursion is frequently required while processing an input
value with `match`

. Here, a procedure returning the
length of a list is defined.

(letrec ([len (lambda (lst) (match lst [() 0] [(,x ,x* ...) (+ 1 (len x*))]))]) (len '(a b c d))) ⟹ 4

A simpler version of the above uses the catamorphism feature of
`match`

. If a pattern variable is written as
`,[`

, `var`]`match`

recurs on the
matching subpart of the input before evaluating the output
expressions of the clause.

(let ([len (lambda (lst) (match lst [() 0] [(,x . ,[y]) (+ 1 y)]))]) (len '(a b c d))) ⟹ 4

In some cases, `match`

will need to return multiple
values. The catamorphism syntax can be used to receive multiple
values. When making implicit recursive calls using the
catamorphism syntax, zero or more variables between the
parentheses can be included, each representing one of the
expected return values.

(let ([split (lambda (lis) (match lis [() (values '() '())] [(,x) (values `(,x) '())] [(,x ,y . ,[odds evens]) (values `(,x . ,odds) `(,y . ,evens))]))]) (split '(a b c d e f))) ⟹ (a c e) (b d f)

Sometimes it is useful to explicitly name the operator in a
catamorphism subpattern. Whereas ```
,[⟨variable⟩
…]
```

recurs to the top of the current
match, ```
,[⟨cata operator⟩ -> ⟨variable⟩
…]
```

recurs to (the result of
evaluating) ```
⟨cata
operator⟩
```

. `⟨Cata operator⟩`

must evaluate to a procedure that accepts one argument, the
matched value, and returns as many values as there are
identifiers following the `->`

.

Here is an example that illustrates the use of guards and how to achieve the effect of a catch-all clause.

(let ([simple-eval (lambda (x) (match x [,i (guard (integer? i)) i] [(+ ,[x*] ...) (apply + x*)] [(* ,[x*] ...) (apply * x*)] [(- ,[x] ,[y]) (- x y)] [(/ ,[x] ,[y]) (/ x y)] [,x (assertion-violation 'simple-eval "invalid expression" x))))]) (simple-eval '(+ (- 0 1) (+ 2 3)))) ⟹ 4

The `match`

form extends `quasiquote`

in
the `⟨bodies⟩`

to an ellipsis-aware version
that allows ellipses to be used in place
of `unquote-splicing`

, which often leads to more
readable code. Consider the following transformer
of `let`

forms in a Scheme-like language:

(define translate (lambda (x) (match x [(let ([,var* ,expr*] ...) ,body ,body* ...) `((lambda ,var* ,body ,@body*) ,@expr*)] [,x (assertion-violation 'translate "invalid expression" x)])))

This can be rewritten as follows:

(define translate (lambda (x) (match x [(let ((,var* ,expr*) ...) ,body ,body* ...) `((lambda ,var* ,body ,body* ...) ,expr* ...)] [,x (assertion-violation 'translate "invalid expression" x)])))

The better readability is, in particular, illustrated by a
procedure as the following with nested ellipses. It converts
unnamed `let`

expressions into direct lambda
applications, where the `let`

has been generalized to
allow an implicit `begin`

in each right-hand-side
expression.

(define (f x) (match x [(let ([,x ,e1 ...] ...) ,b1 ,b2 ...) `((lambda (,x ...) ,b1 ,b2 ...) (begin ,e1 ...) ...)]))

The basic usage of (the ellipsis-aware) `quasiquote`

is the same as in R^{6}RS:

`(list ,(+ 1 2) 4) ⟹ (list 3 4)

Lists can still be spliced into sequences.

`(a ,(+ 1 2) ,@(map abs '(4 -5 6)) b) ⟹ (a 3 4 5 6 b)

The extension to `quasiquote`

allows ellipses to be
used in place of `unquote-splicing`

(`,@`

)
to piece together the output form.

`(a ,(+ 1 2) ,(map abs '(4 -5 6)) ... b) ⟹ (a 3 4 5 6 b)

Within each subform followed by an ellipsis, each comma-prefixed item must be a list and all such items within the same subform must have the same length.

`((,'(1 2 3) . ,'(a b c)) ...) ⟹ ((1 . a) (2 . b) (3 . c))

A subform followed by an ellipsis may be contained within a larger subform that is also followed by an ellipsis. In this case, each comma-prefixed item must be a list of lists, each such item must have the same length, and the corresponding sublists of each such item must have the same lengths. This requirement generalizes to comma-prefixed items nested within more than two levels of ellipsis-followed subforms.

`(((a ,'((x 1) (x 2) (x 3))) ...) ...) ⟹ (((a x) (a 1)) ((a x) (a 2)) ((a x) (a 3)))

In the output, a subform may be followed directly by two or more ellipses; the requirements are the same as for nested ellipses, but the result is flattened rather than nested.

`((a ,'((x 1) (x 2) (x 3))) ... ...) ⟹ ((a x) (a 1) (a x) (a 2) (a x) (a 3))

Ellipses can also follow subforms containing items prefixed by `unquote-splicing`

(`,@`

).

`((a ,@'((x 1) (x 2) (x 3))) ...) ⟹ ((a x 1) (a x 2) (a x 3))

A subform of the form `(... ⟨form⟩)`

is
identical to `⟨form⟩`

, except that ellipses
in the subform have no special meaning.

`(... (,'(1 2 3) ...)) ⟹ ((1 2 3) ...)

Substitutions are made only for unquoted components appearing at the same nesting level as the outermost quasiquote.

`(a `(b ,(list 1 2) ... ,(foo ,(list 1 3) ... d) e) f) ⟹ (a `(b ,(list 1 2) ... ,(foo 1 3 d) e) f)

Both the (withdrawn) SRFI 204 and this SRFI describe a pattern matcher and suggest that the respective one becomes included in Scheme implementations and a future Scheme standard. Yet, they are not in competition with each other. Quite the contrary, it makes a lot of sense that a Scheme system supports both matchers as they apply to different areas.

While a number of basic tasks can be solved equally well with the matcher described in SRFI 204 and the matcher described here, some problems are better solved with one or the other. The matcher of SRFI 204 mirrors most of Scheme's binding constructs. For these binding constructs, the catamorphism feature in this SRFI does not make sense, so this SRFI's matcher lacks it. On the other hand, the matcher described here distinguished itself due to its catamorphism feature but, therefore, cannot support binding constructs.

SRFI 204 has a complex pattern language and allows one to write complex matchers with just a single pattern. If this is not needed, the pattern language of this SRFI excels due to its simplicity and exceptionally clear semantics.

For historical reasons, both the matcher described here and in
SRFI 204 are called `match`

. This specification does
not change the established name. Fortunately, when reading
code, there is virtually no danger of confusion. Pattern
variables in the matcher described here are all unquoted and
there is no quasiquote while in patterns of SRFI 204's match,
unquoted datums usually only appear in quasiquotations.

If more than one matcher is needed in a program, one can easily use the rename facility of the Scheme implementation's module system.

*Remark:* The author of this SRFI suggests that any
reviving attempt of SRFI 204 should make the resulting pattern
matcher extensible so that there are only few primitives with
clear semantics allowing the rest of the specification defined
in terms of them.

Even better than reviving SRFI 204 would be specifying a facility to define type-safe matchers similar to the Nanopass framework.

The match form uses a pattern language akin
to `syntax-rules`

and `syntax-case`

, which
is described here.

A `⟨pattern⟩`

has one of the following forms:

`(⟨pattern⟩ ⟨ellipsis⟩ . ⟨pattern⟩)`

`(⟨pattern⟩ . ⟨pattern⟩)`

`()`

`#(⟨pattern⟩ … ⟨pattern⟩ ⟨ellipsis⟩ ⟨pattern⟩ …)`

`,_`

`,⟨pattern variable⟩`

`⟨cata pattern⟩`

`⟨symbol⟩`

`⟨constant⟩`

where a `⟨cata pattern⟩`

has one of the following forms:

`,[⟨cata operator⟩ -> ⟨cata variable⟩ …]`

`,[⟨cata variable⟩ …]`

and where the `⟨ellipsis⟩`

is the
ellipsis `...`

(in the sense
of `free-identifier=?`

), `⟨cata operators⟩`

are expressions, and `⟨pattern variable⟩`

and `⟨cata variable⟩`

are identifiers.

In each `⟨pattern⟩`

,
the `⟨pattern variables⟩`

and `⟨cata variables⟩`

must be pairwise
disjoint (in the sense
of `bound-identifier=?`

). ```
⟨Pattern
variables⟩
```

must not
be `...`

, `_`

, or `unquote`

(in
the sense of `free-identifier=?)`

.

Patterns match Scheme values by the following rules:

A pattern of the form `(⟨pattern`

matches a
dotted list if _{1}⟩
⟨ellipsis⟩ . ⟨pattern_{2}⟩)`⟨pattern`

matches each proper list element
and _{1}⟩`⟨pattern`

matches the tail of the dotted list._{2}⟩

A pattern of the form `(⟨pattern`

matches a
pair if _{1}⟩
. ⟨pattern_{2}⟩)`⟨pattern`

matches the car
and _{1}⟩`⟨pattern`

matches the cdr of the pair._{2}⟩

A pattern of the form `()`

matches the empty list.

A pattern of the form `#(⟨pattern`

matches a vector of _{1}⟩ …
⟨pattern_{k}⟩ ⟨pattern_{e}⟩
⟨ellipsis⟩ ⟨pattern_{m+1}⟩ …
⟨pattern_{n}⟩)`n` or more elements whose first `k` elements match
`⟨pattern`

through _{1}⟩`⟨pattern`

,
whose next _{k}⟩`m`−`k` elements each match
`⟨pattern`

, and whose remaining _{e}⟩`n`−`m` elements match
`⟨pattern`

through _{m+1}⟩`⟨pattern`

.
_{n}⟩

A pattern of the form `,_`

matches an arbitrary
Scheme value.

A pattern of the form `,⟨pattern variable⟩`

matches an arbitrary Scheme value.

A `⟨cata pattern⟩`

matches an arbitrary Scheme value.

A pattern of the form `⟨symbol⟩`

matches a symbol
with the same name as `⟨symbol⟩`

.

A pattern of the form `⟨constant⟩`

match a value
that is equal (in the sense of `equal?`

) to the
literal datum `⟨constant⟩`

.

The following syntax together with the auxiliary
syntax `unquote`

, `unquote-splicing`

,
`...`

, `_`

, `guard`

, and `->`

is
exported by the libraries `(srfi :241 match)`

and `(srfi :241)`

. The auxiliary
syntax `unquote`

, `unquote-splicing`

,
`...`

, and `_`

is identical to the
auxiliary syntax exported by `(rnrs base)`

,
and `guard`

is identical to the syntax exported
by `(rnrs exceptions)`

.

`(match ⟨input expression⟩ ⟨clause`

_{1}⟩ …)

*Syntax:* `⟨input expression⟩`

can be any
expression. The `⟨clauses⟩`

take one of two forms, either

`(⟨pattern⟩ ⟨body⟩)`

or

`(⟨pattern⟩ (guard ⟨guard expression⟩ …) ⟨body⟩)`

where the ` ⟨guard expressions⟩`

can be
any expressions.

*Semantics:* A `match`

expression is evaluated
as follows: First, the `⟨input expression⟩`

is
evaluated to yield an input value. Then, the
first `⟨clause⟩`

for which the input value matches
its `⟨pattern⟩`

is selected. The environment in
which the `match`

expression is evaluated is
extended by binding the pattern variables of
the `⟨pattern⟩`

to fresh locations, and the values
that were matched against these pattern variables are stored in
those locations. If `⟨guard expressions⟩`

are present
in the selected clause, they are evaluated in left-to-right
order in the extended environment until one evaluates
to `#f`

. In this case, the rest of
the `⟨clause⟩`

is skipped and pattern matching
proceeds with the next `⟨clause⟩`

. If
no `⟨guard expression⟩`

evaluates
to `#f`

, or if there is no ```
⟨guard
expresssion⟩
```

present, in unspecified order,
the `⟨cata operators⟩`

of
the `⟨cata clauses⟩`

in
the `⟨pattern⟩`

of the
selected `⟨clause⟩`

are evaluated in no
specific order in the extended environment to yield cata
procedures, which are then invoked on the value that was matched
against the `⟨cata pattern⟩`

. The
environment is extended a second time by binding
the `⟨cata variables⟩`

of
these `⟨cata patterns⟩`

to fresh locations,
and the values resulting from invoking the cata procedures are
stored in left-to-right order in the corresponding locations.
Finally, the twice-extended environment is extended a third time
by binding `quasiquote`

to ellipsis-aware
quasiquotation syntax (see below),
the `⟨body⟩`

of the
selected `⟨clause⟩`

is evaluated in this
thrice-extended environment, and the resulting values from its
last expression are returned.

If a `⟨cata pattern⟩`

clause is of the
second form, the missing `⟨cata operator⟩`

defaults to an expression that evaluates to a procedure that, when
called with one argument, evaluates the `match`

expression with the input value replaced by the argument value in
tail position.

If no pattern of any clause is matched, an exception
of type `&assertion-violation`

is raised.

If the `match`

expression is in tail context,
the `⟨bodies⟩`

are ```
⟨tail
bodies⟩
```

.

*Note:* A pattern variable in the sense of this SRFI is an
ordinary variable, not a *pattern variable* in the sense of
the `syntax-case`

system of R^{6}RS.

*Remark:* The pattern language makes use
of `unquote`

without a
corresponding `quasiquote`

. It has been criticized
that this leads to an “unbalanced” quasiquotation
syntax (there is an `unquote`

without
a surrounding `quasiquote`

) and it has been suggested to add
the “implicit” `quasiquote`

explicitly so
that the first example

(define (fold-right kons knil lis) (match lis [(,x . ,[x*]) (kons x x*)] [() knil]))

would become:

(define (fold-right kons knil lis) (match lis [`(,x . ,[x*]) (kons x x*)] [`() knil]))

We don't subscribe to the view that there is an
“implicit” `quasiquote`

in
the `match`

syntax described in this SRFI. In
fact, `quasiquote`

is not auxiliary syntax
for `match`

, and if there were an
implicit `quasiquote`

, a nested one should cancel
an `unquote`

, which it doesn't. The same fallacy was
made in SRFI 200. Moreover, following the `unquote`

is not an expression, but either a ```
⟨pattern
variable⟩
```

or a ```
⟨cata
pattern⟩
```

.

The following syntax together with the auxiliary
syntax `unquote`

, `unquote-splicing`

,
and `...`

is exported by the library ```
(srfi
:241 match quasiquote)
```

. The auxiliary syntax is
identical to the auxiliary syntax exported by ```
(rnrs
base)
```

.

`(quasiquote ⟨ellipsis-aware qq template⟩)`

If no instances of the ellipsis `...`

appear within
the `⟨ellipsis-aware qq template⟩`

at the
same nesting level as the outermost quasiquote, the result of
evaluating the ellipsis-aware `quasiquote`

is the
same as if the `quasiquote`

syntax of R^{6}RS
were used. A subtemplate of the
form `(⟨ellipsis⟩ ⟨qq template⟩)`

is identical to `⟨qq template⟩`

, except
that ellipses within `⟨qq template⟩`

have
no special meaning. If a subtemplate is followed by one or more
instances of the `⟨ellipsis⟩`

, the
expressions following a comma have to evaluate into nested lists
of (at least) the same nesting depths. They are replaced in the
output by all of the elements they match in the input,
distributed as indicated. It is an error if the output cannot
be built up as specified. An ```
(⟨unquote-splicing
⟨expression⟩ …)
```

form is equivalent to
```
(⟨unquote⟩ ⟨expression⟩ …)
⟨ellipsis⟩
```

.

The auxiliary syntax `quasiquote`

within
an `⟨ellipsis-aware qq template⟩`

is the
ellipsis-aware `quasiquote`

.

A portable
implementation for R^{6}RS systems is
in this
SRFI's repository.

A portable implementation for R^{7}RS systems that
support `syntax-case`

is possible as well. For an
R^{7}RS implementation, the library names should be
derived from the given
(SRFI
97-conformant) R^{6}RS library names as usual.

The pattern matcher defined here has originally been described by R. Kent Dybvig. The original program was originally designed and implemented by Dan Friedman. It was later redesigned and reimplemented by Erik Hilsdale. The sample implementation of this SRFI shares no code with Dybvig's, Friedman's and Hilsdale's version.

The example section shamelessly reuses text from R. Kent Dybvig's original description.

© 2022 Marc Nieper-Wißkirchen.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Editor: Arthur A. Gleckler