by Panicz Maciej Godek
This SRFI is currently in withdrawn status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-200@nospamsrfi.schemers.org
. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.
editor's summary of reasons for withdrawal: The SRFI had reached 768 days since first draft, which is far past the ninety-day limit, and past even the more lax deadlines of the current editor. Panicz simply didn't have time to devote to the SRFI, so I suggested withdrawing it, and he agreed. No one had volunteered to take over, so it's time to withdraw the SRFI. Others are welcome to take up the content in a new SRFI.
This SRFI discusses some of the existing pattern-matching
libraries for the Scheme programming language — namely,
the pattern matcher presented by Andrew K. Wright and Robert
Cartwright in the paper "A Soft Type System for Scheme", the
pattern matcher developed by Dan Friedman, Erik Hilsdale and
Kent Dybvig, the racket/match
module
distributed with the Racket programming environment, as well
as the Bigloo and Gerbil pattern matchers distributed with
their respective implementations.
It then extracts a pattern syntax which is compatible with three of
those implementations and provides extrinsic rationale for that
syntax.
It also provides a simple implementation of a pattern matcher
which conforms to the specification of a pattern language provided
in this document.
Pattern matching is a technique for (conditionally) naming components of compound data structures. It is commonly known in functional programming circles, and it has been adapted to the Scheme programming language in the early 1990s.
It also partially overlaps in functionality with the technique called destructuring bind, known from Common Lisp.
For example, instead of writing
(cond ((and (list? x) (= (length x) 3) (eq? (car x) '+)) (+ (cadr x) (caddr x))) (else 'operation-not-supported))it is often more clear and concise to write
(match x (`(+ ,x ,y) (+ x y)) (_ 'operation-not-supported))
Today, there exist a few different implementations that offer many extensions to the basic pattern-matching facility, and that are mutually incompatible.
The purpose to this SRFI is to codify the basic common features that are present in all or most of those libraries in order to facilitate the development of portable programs that utilize pattern matching.
This document isn't meant to serve as comprehensive documentation for any of the discussed libraries, as they are already well documented in their appropriate sources.
It does, however, provide a minimalist reference implementation
expressed in terms of simple syntax-case
macros,
and an equivalent implementation expressed in terms of
syntax-rules
macros.
The main interface to using syntactic pattern matching is the
same for all libraries discussed in this document — namely,
the match
macro, which is used in the following way:
(match expression (<pattern-1> <actions-1*> ... <value-1>) ... (<pattern-N> <actions-N*> ... <value-N>))
where <actions-X*>
are optional sequences
of expressions.
The expression
is evaluated once, and its value is
tested against the subsequent <pattern-X>
patterns. When the first matching pattern
<pattern-K>
is encountered,
the sequence of <actions-K*>
is executed,
and the value of the expression <value-K>
becomes the value of the whole match
expression.
The pattern may contain some variables (which are typically expressed
using unquoted symbols). If it does, then those variables become the
names of some parts of the data being pattern-matched, and the named
parts can be referred to from the <actions-K*>
and <value-K>
expressions.
If the data does not match any of the patterns, the result
of the expression depends on the pattern matcher under consideration:
it may be either an unspecified value, some specified value or an error.
However, since writing non-exhaustive patterns in match
expressions is generally considered a bad practice, this aspect of
the match
form will not be further discussed in this
document.
It is mainly the form, capabilities and the interpretation of the
<pattern-X>
expressions
where the libraries differ.
Probably the most common pattern-matching library distributed
with some popular Scheme implementations is the
match.scm
module developed by Alex Shinn in portable syntax-rules
macros. It is based on a specification developed by Andrew K. Wright
and Robert Cartwright in their paper
A Practical Soft Type System for Scheme.
It is distributed along with many Scheme implementations, including
Chibi (the (chibi match)
module),
Guile (the (ice-9 match)
module),
Cyclone (the (cyclone match)
module),
LispKit (the match.sld
file),
Loko (the match.sls
file),
Mosh (the Pattern Match library
and Sagittarius (the (match)
module).
It is also available in the Chicken Scheme's repository as the
matchable
module.
The <pattern-X>
expressions are interpreted
in the following way: if a pattern is a literal, then it matches
the data that is equal to that literal. Otherwise, if it is a regular
unquoted symbol (henceforth called a pattern variable), then
the match succeeds, and the symbol gets bound to the value being pattern-matched.
If the pattern is the special symbol _
(underscore), then
the match succeeds, but the symbol is not bound.
Things become more interesting when the pattern is a pair, i.e.
when it has the form (head . tail)
.
Roughly speaking, data matches such patterns if it is also a pair,
the head of the data matches the head of the pattern, and
the tail of the data matches the tail of the pattern. So for example,
the list (1 2)
matches the pattern (a b)
(with a
being bound to 1
and b
being bound to 2
), but it also matches the pattern
(a . b)
(where a
is bound to 1
and b
is bound to (2)
).
The same symbol can be used in a pattern more than once. When that's
the case, the pattern matches only if all the occurrences of that
symbol would be bound to values that are equal?
: for
example, the pattern (a . a)
matches the value
(1 . 1)
or ((1) . (1))
, but it does
not match the value (1 . 2)
or the value (1 1)
.
This allows one to define a function that removes adjacent equal?
elements from a list in the following way:
(define (unique elements) (match elements ((x x . rest) (unique `(,x . ,rest))) ((x . rest) `(,x . ,(unique rest))) (() '())))
For example, (unique '(a a a b b b b a a c c))
evaluates
to (a b a c)
.
An equivalent definition that does not use pattern matching would look like this:
(define (unique elements) (cond ((null? elements) '()) ((and (pair? elements) (pair? (cdr elements)) (equal? (car elements) (cadr elements))) (unique (cdr elements))) (else (cons (car elements) (unique (cdr elements))))))
The previous section claimed that a value matches the
(head . tail)
pattern only if it is
a pair and the head and the tail of the data matches the head
and the tail of the pattern.
This is true in most circumstances. However, there are a few
exceptions. Notably, if the pattern is a pair and its head is the
quote
symbol, then the data matches the pattern only
if it is equal?
to the tail of the pattern. For example,
the symbol 'x
matches the pattern 'x
and the value of the expression (list 1 2 3)
matches
the pattern (quote (1 2 3))
, or equivalently
'(1 2 3)
.
More interestingly, if the head of the pattern is the symbol
quasiquote
, then the pattern behaves similarly
to quasi-quotation, attributing special meaning to the unquote
symbol. In particular, this allows one to express
the above definition of the function unique
as:
(define (unique elements) (match elements (`(,x ,x . ,rest) (unique `(,x . ,rest))) (`(,x . ,rest) `(,x . ,(unique rest))) ('() '())))
This feature exposes symmetry between pattern matching
and quasi-quotation. However, the unquote-splicing
operator is not implemented in the Wright-Cartwright-Shinn matcher.
The Wright-Cartwright-Shinn pattern matcher also offers a special syntax for
testing whether a given sub-pattern satisfies a certain predicate:
if a pattern is a list whose first element is the ?
(question mark) symbol, then the second element of that list must
be a unary predicate. The pattern matches if the predicate is
satisfied.
Optionally, there may be a third element in the list. It is an additional pattern that needs to be satisfied by the datum. It is typically used for naming the element that satisfies the predicate.
Even though the Wright-Cartwright-Shinn pattern matcher does not support
the unquote-splicing
operator, it provides
the ...
(ellipsis) operator which behaves similarly
to the one that can be found in the syntax-rules
pattern language.
The ...
operator must appear in a list, following
a pattern, but no more than one ellipsis is allowed to appear
at a single level of nesting. The behavior of the operator is
best illustrated with an example - the palindrome?
predicate, which evaluates to #true
iff its argument
is a list equal?
to its own reverse:
(define (palindrome? list) (match list (() #true) ((x) #true) ((a b ... a) (palindrome? b)) (_ #false)))
If list
is (1 2 3 2 1)
, then the
first invocation of palindrome?
will match the
(a b ... a)
pattern, binding a
to
1 and b
to the list (2 3 2)
.
The ellipsis operator can also be used with more complex patterns. For example, the expression
(match '((a 1) (b 2) (c 3)) (((s n) ...) `(,s ,n)))
evaluates to ((a b c)(1 2 3))
.
The ellipsis operator can be nested, so the pattern
((x ...) ...)
is also valid, even if difficult
to substantiate.
The Wright-Cartwright-Shinn matcher also allows one to combine patterns using
Boolean operators: (and pat-1 ... pat-n)
matches
if a given object matches all of the pat-1 ... pat-n
patterns, (or pat-1 ... pat-n)
matches if a given object
matches any of the pat-1 ... pat-n
patterns,
and (not pat-1 ... pat-n)
matches if an object does not match
any of the pat-1 ... pat-n
patterns.
The specification of the Wright-Cartwright-Shinn pattern matcher provides the following grammar:
patterns: matches: pat ::= identifier anything, and binds identifier | _ anything | () the empty list | #t #t | #f #f | string a string | number a number | character a character | 'sexp an s-expression | 'symbol a symbol (special case of s-expr) | (pat_1 ... pat_n) list of n elements | (pat_1 ... pat_n . pat_{n+1}) list of n or more | (pat_1 ... pat_n pat_n+1 ooo) list of n or more, each element of remainder must match pat_n+1 | #(pat_1 ... pat_n) vector of n elements | #(pat_1 ... pat_n pat_n+1 ooo) vector of n or more, each element of remainder must match pat_n+1 | #&pat box | ($ record-name pat_1 ... pat_n) a record | (= field pat) a ``field'' of an object | (and pat_1 ... pat_n) if all of pat_1 thru pat_n match | (or pat_1 ... pat_n) if any of pat_1 thru pat_n match | (not pat_1 ... pat_n) if all pat_1 thru pat_n don't match | (? predicate pat_1 ... pat_n) if predicate true and all of pat_1 thru pat_n match | (set! identifier) anything, and binds setter | (get! identifier) anything, and binds getter | `qp a quasi-pattern | (identifier *** pat) matches pat in a tree and binds identifier to the path leading to the object that matches pat ooo ::= ... zero or more | ___ zero or more | ..1 1 or more quasi-patterns: matches: qp ::= () the empty list | #t #t | #f #f | string a string | number a number | character a character | identifier a symbol | (qp_1 ... qp_n) list of n elements | (qp_1 ... qp_n . qp_{n+1}) list of n or more | (qp_1 ... qp_n qp_n+1 ooo) list of n or more, each element of remainder must match qp_n+1 | #(qp_1 ... qp_n) vector of n elements | #(qp_1 ... qp_n qp_n+1 ooo) vector of n or more, each element of remainder must match qp_n+1 | #&qp box | ,pat a pattern | ,@pat a pattern
(require racket/match)
in their Racket program. It can also be found in the
Racket's GitHub repository.
The rules for simple patterns in the Racket pattern matcher are identical
with those described above for the Wright-Cartwright-Shinn pattern matcher: literals
match literals, symbols match and bind to anything, and the special symbol
_
(underscore) matches anything but binds to nothing.
The rules for compound patterns in the Racket pattern matcher differ
significantly compared to the compound patterns found in the Wright-Cartwright-Shinn
matcher. In particular, compound patterns must be proper lists, and
their first element must always be a special symbol understandable
by the matcher. For example, if we want to match a pair (a . b)
,
we need to use the pattern (cons a b)
; if we want to
match a list (a b)
, we need to use the pattern
(list a b)
and so forth.
Therefore, the unique
function defined in the previous
section could be translated to the Racket's pattern language in the
following way:
(define (unique elements) (match elements ((list-rest x x rest) (unique (cons x rest))) ((cons x rest) (cons x (unique rest))) ('() '())))
Overall, the documentation of Racket provides a detailed list of symbols that are allowed in the head position of compound patterns.
pat ::= id match anything, bind identifier | (var id) match anything, bind identifier | _ match anything | literal match literal | (quote datum) match equal? value | (list lvp ...) match sequence of lvps | (list-rest lvp ... pat) match lvps consed onto a pat | (list-no-order pat ...) match pats in any order | (list-no-order pat ... lvp) match pats in any order | (vector lvp ...) match vector of pats | (hash-table (pat pat) ...) match hash table | (hash-table (pat pat) ...+ ooo) match hash table | (cons pat pat) match pair of pats | (mcons pat pat) match mutable pair of pats | (box pat) match boxed pat | (struct-id pat ...) match struct-id instance | (struct struct-id (pat ...)) match struct-id instance | (regexp rx-expr) match string | (regexp rx-expr pat) match string, result with pat | (pregexp px-expr) match string | (pregexp px-expr pat ) match string, result with pat | (and pat ...) match when all pats match | (or pat ...) match when any pat match | (not pat ...) match when no pat matches | (app expr pats ...) match (expr value) output values to pats | (? expr pat ...) match if (expr value) and pats | (quasiquote qp) match a quasipattern | derived-pattern match using extension literal ::= #t match true | #f match false | string match equal? string | bytes match equal? byte string | number match equal? number | char match equal? character | keyword match equal? keyword | regexp literal match equal? regexp literal | pregexp literal match equal? pregexp literal lvp ::= (code:line pat ooo) greedily match pat instances | pat match pat qp ::= literal match literal | id match symbol | (qp ...) match sequences of qps | (qp ... . qp) match qps ending qp | (qp ooo . qp) match qps beginning with repeated qp | #(qp ...) match vector of qps | #&qp match boxed qp | #s(prefab-key qp ...) match prefab struct with qp fields | ,pat match pat | ,(list lvp ...) match lvps, spliced | ,(list-rest lvp ... pat) match lvps plus pat, spliced | ,'qp match list-matching qp, spliced ooo ::= ... zero or more; ... is literal | ___ zero or more | ..k k or more | __k k or more
A closer look at the rules reveals that some of the symbols allowed
in patterns coincide with the ones in the Wright-Cartwright-Shinn matcher. In particular,
quote
, and
, or
, not
,
?
, and quasiquote
.
The Racket matcher also supports the feature known as pattern guards. A pattern guard is an additional condition which needs to be satisfied in order for a match to be considered successful.
In Racket, pattern guards are expressed using the #:when
keyword following a pattern, followed by the condition. For example:
(match '(1 2) (`(,a ,b) #:when (odd? (+ a b)) 'odd-sum) (_ 'unknown))
The distinctive feature of the Racket matcher is the ability to provide
derived patterns. In addition to the match
form,
Racket provides the define-match-expander
extension mechanism,
which allows one to define new types of patterns.
The Bigloo pattern matcher is distributed along with the Bigloo scheme implementation, but it is also provided by STKlos. According to the, it supports the following syntax:
<pattern> ⇒ Matches: <atom> the <atom>. | (kwote <atom>) any expression eq? to <atom>. | (and <pat1> ... <patn>) if all of <pati> match. | (or <pat1> ... ...<patn>) if any of <pat1> through <patn> matches. | (not <pat>) if <pat> doesn't match. | (? <predicate>) if <predicate> is true. | (<pat1> ... <patn>) a list of n elements. Here, ... is a meta-character denoting a finite repetition of patterns. | <pat> ... a (possibly empty) repetition of <pat> in a list. | #(<pat> ... <patn>) a vector of n elements. | #{<struct> <pat> ... } a structure. | ?<id> anything, and binds id as a variable. | ?- anything. | ??- any (possibly empty) repetition of anything in a list. | ???- any end of list.
The most apparent differences between Bigloo and the two matchers described above are the following:
?
(question mark) symbol.
All symbols that do not begin with the question mark (except kwote
,
and
, or
and not
that have a special
meaning) are treated as literal symbols;?-
is used (instead of _
) to
match anything without performing a binding;match-case
rather than match
;quasiquote
form is not supported.According to the documentation, the pattern matcher bundled with Gerbil Scheme supports the following syntax:
(match expr (pattern body ...) ... [(else body ...)]) <pattern>: (? test) ; predicate test with the `?` predicate constructor (? test pattern) ; test and match a pattern (? test => pattern) ; test and match a pattern on the value of the test (? test :: proc => pattern) ; test and match with a filter (and pattern ...) ; match all patterns (or pattern ...) ; match any pattern (not pattern) ; negated match (cons pattern1 pattern2) ; destructure a pair like cons (cons* pattern ... pattern-tail) ; destructure a list like cons* [pattern ...] ; (@list pattern ...) ; destructure a list like @list (box pattern) ; #&pattern ; destructure a box (values pattern ...) ; destructure a values tuple (vector pattern ...) ; #(pattern ...) ; destructure a vector (struct-id pattern ...) ; destructure a struct (class-id (slot pattern) ...) ; destructure a class (eq? val) ; match eq? to val (eqv? val) ; match eqv? to val (equal? val) ; match equal? to val (quote expr) ; match eq?/eqv?/equal? to a quoted value (quasiquote datum) ; destructure with quasiquote (apply getf pattern) ; applicative destructuring (match-macro arg ...) ; apply match macro expander _ ; match any and ignore id ; match any and bind to id datum ; match eq?/eqv?/equal? to a datum (match <> (match-pattern body ...) ...) => (lambda (obj) (match obj (match-pattern body ...) ...)) (match <...> (match-pattern body ...) ...) => (lambda args (match args (match-pattern body ...) ...))
The syntax is therefore largely compatible with
the Wright-Cartwright-Shinn matcher and the Racket matcher.
One noteworthy difference is the presence of the @list
pattern: it stems from the fact that the Gerbil reader treats
the square bracket lists as lists beginning with the
@list
symbol. (Some Scheme implementations,
such as Kawa or - with appropriately set read-options
- Guile - use the $bracket-list$
symbol for
the same purpose, although none of this is R6RS-compliant.)
Outside of the scope of the
match
syntax, @list
is defined as
a macro similar to quasiquote
, but such that
each of its arguments is implicitly unquote
d,
and the literal symbol ...
appearing in
@list
ed object's post-fix position plays a role
analogous to unquote-splicing
.
Therefore, in the context of the match
macro,
the @list
keyword is defined to mirror that behavior,
just like the presence of quasiquote
in the
Wright-Cartwright-Shinn and Racket matcher's pattern context mirrors
the behavior of quasiquote
in the regular expression
context.
A copy of the pattern matcher developed at Indiana University by Dan Friedman, Erik Hilsdale and Kent Dybvig can be found on GitHub.
The basic syntax of this pattern matcher is similar to the syntax of the Wright-Cartwright-Shinn matcher, the key difference being that in the case of Friedman-Hilsdale-Dybvig matcher, the patterns are implicitly quasi-quoted.
This means, that in order to adapt the basic patterns
of the Friedman-Hilsdale-Dybvig matcher to the Wright-Cartwright-Shinn
matcher, one needs to wrap each pattern in a quasiquote
.
In other
words, assuming that the Wright-Cartwright-Shinn matcher is available
under the match/WCS
binding, then the basic syntax of
the Friedman-Hilsdale-Dybvig matcher can be achieved via the following
transformation:
(define-syntax match/FHD (syntax-rules () ((match/FHD (pattern . body) ...) (match/WCS ((quasiquote pattern) . body) ...))))
Consequently, the definition of the unique
function
would take the following form under the Friedman-Hilsdale-Dybvig match
syntax (note the lack of back-ticks):
(define (unique elements) (match elements ((,x ,x . ,rest) (unique `(,x . ,rest))) ((,x . ,rest) `(,x . ,(unique rest))) (() '())))
Similarly, if the Friedman-Hilsdale-Dybvig matcher is available under
the match/WCS
binding, then we can obtain the subset of
Wright-Cartwright-Shinn patterns that begin with quasiquote
via the following transformation:
(define-syntax match/WCS (syntax-rules (quasiquote) ((match/WCS ((quasiquote pattern) . body) ...) (match/FHD (pattern . body) ...))))
Of course, this only works for basic patterns, and doesn't include extensions like predicates or Boolean combinations.
It is worth noting that the same basic quasiquotation patterns are also supported by the Racket pattern matcher.
Similarly to the Racket's matcher, the Friedman-Hilsdale-Dybvig matcher
supports pattern guards. However, it does not require the #:when
keyword. Instead, it assumes that there are no <actions-X*>
forms, and that a clause for the match
macro can either have
the form
(<pattern> <value>)or
(<pattern> <guard> <value>)
Pat ::= (Pat ... . Pat) | (Pat . Pat) | () | #(Pat* Pat ... Pat*) | #(Pat*) | ,Id | ,[Id*] | ,[Cata -> Id*] | Id
syntax-case
/syntax-rules
pattern languageApart from the libraries discussed in this document, the core Scheme language
already provides some pattern-matching capabilities, namely the
syntax-rules
/syntax-case
pattern language.
Unfortunately, the applicability of this language is limited to syntax objects, which are used mainly for expressing Scheme code transformations (such as macro definitions). This limitation is hurtful to the souls of those Scheme programmers who are in love with minimalism, as it requires the duplication of the same feature in different context. However, fixing that flaw is beyond the scope of this document.
This document focuses on libraries which extend the syntax of Scheme with pattern-matching capabilities. It is perhaps worth mentioning that the term "pattern matching" is also sometimes used in a slightly different sense: not as a syntactic extension to a language, but as a function that takes an s-expression representing a pattern and some object, and returns a list of bindings (for example, in the form of an association list) that could turn a pattern into that object.
This kind of a pattern-matching library has been described, for example, in the fifth chapter of Peter Norvig's book The Paradigms of Artificial Intelligence Programming.
Since pattern matching is a feature that has already been widely adapted by various Scheme communities, this document does not attempt to subvert this adaption. Instead, it aims at providing recommendations for writing future Scheme code that maximizes compatibility between the existing pattern matchers.
It has been noted that compound patterns that begin with quasiquote
are compatible between the Wright-Cartwright-Shinn matcher,
the Racket matcher and the Gerbil matcher, and that the
Friedman-Hilsdale-Dybvig matcher can trivially be adapted to make
quasiquotations explicit (although - as it was shown by Marc
Nieper-Wißkirchen on
this SRFI's mailing list,
it is impossible
to express certain patterns in a portable manner - in particular,
patterns that are meant to match the literal quasiquote
symbol need to be expressed differently).
This document recommends that the compound patterns are expressed using explicit quasiquotation.
The additional advantage of this practice is that it shows a symmetry between quasiquotation and pattern matching.
In particular, it recommends sticking to the following subset of the pattern grammar, which is compatible with Gerbil, Racket and Wright-Cartwright-Shinn matcher:
pat ::= identifier anything, and binds identifier | _ anything | () the empty list | #t #t | #f #f | string a string | number a number | character a character | 'sexp an s-expression | 'symbol a symbol (special case of s-expr) | `qp a quasi-pattern quasi-patterns: matches: qp ::= () the empty list | #t #t | #f #f | string a string | number a number | character a character | identifier a symbol | (qp_1 ... qp_n) list of n elements | (qp_1 ... qp_n . qp_{n+1}) list of n or more of remainder must match qp_n+1 | ,pat a pattern
This document discourages the use of certain pattern-matching features, in particular:
and
, or
,
not
)More specifically, the uses of those extensions are obviously allowed, but using them makes a program non-SRFI-200-compliant, and thus were not portable between different pattern matchers.
This document also doesn't recommend beginning the names of
identifier patterns with the ?
(question mark)
character, as the Bigloo matcher does, because it is impossible
to distinguish particular systematic forms of symbols in
syntax-rules
macros. The compatibility with
the Bigloo matcher and other matchers is unattainable anyway,
because the identifiers that appear in patterns need to be
stripped off of the initial question mark in order to refer
to the objects bound by those identifiers.
Three of the five pattern matchers presented here already support the pattern language recommended by this SRFI, and one comes close. Only the Bigloo matcher is almost completely incompatible with this specification.
The Wright-Cartwright-Shinn pattern matcher implemented in terms of syntax-rules
macros is available from Alex Shinn's website.
The Racket pattern matcher is distributed with Racket, and is available from Racket's GitHub repository.
The code for the Gerbil matcher can be found in the Gerbil repository,
Save for a few problematic patterns, the Friedman-Hilsdale-Dybvig
matcher can be adapted to the pattern language
by defining a macro matchWCS
(define-syntax matchWCS (syntax-rules (quasiquote) ((_ ((quasiquote pattern) . body) ...) (match (pattern . body) ...))))
and re-exporting it to shadow the match
binding
provided by the Friedman-Hilsdale-Dybvig matcher.
A copy of the pattern matcher developed at Indiana University by Dan Friedman, Erik Hilsdale and Kent Dybvig can be found on GitHub.
Lastly, one could use a simple reference implementation
provided by this SRFI. The main interface to the matcher is the
match
macro, which valuates the expression
to be pattern-matched on, and passes the work to the
match/evaluated
macro:
(define-syntax-rule (match expression (pattern actions* ... value) ...) (let ((evaluated expression)) (match/evaluated evaluated (pattern actions* ... value) ...)))
The match/evaluated
macro analyzes the first clause
and passes itself recursively on the remaining clauses to the failure
continuation of a match-clause
macro. If there are no
clauses left, it signals an error (which is an acceptable way
of handling this situation, although not required):
(define-syntax match/evaluated (syntax-rules () ((match/evaluated value) ;; This behavior is unspecified, and an "unspecified" ;; value would also be fine here. (error 'no-matching-pattern)) ((match/evaluated value (pattern actions ...) . clauses) (match-clause ((pattern value)) (and) () actions ... (match/evaluated value . clauses)))))
The match-clause
macro is responsible for converting the
pattern into a condition
(which is a conjunction of all conditions
stemming from the pattern's sub-patterns) and bindings
.
It maintains (and processes) a list of (pattern path)
pair,
where path
is an expression consisting of nested car
s
and cdr
s leading to the (evaluated) expression being pattern-matched.
It also carries around the set of actions
to be performed when
the match succeeds, as well as the form to be evaluated when it does not succeed.
When the match-clause
encounters a nested pattern, i.e. a pair
consisting of left
and right
sub-patterns, it extends
the condition
with the requirement that the current target
is a pair?
, and it expands the list with the left
and right
sub-patterns.
When the match-clause
encounters an identifier, then it extends
the set of bindings with the current path
.
Otherwise, when it encounters a literal value, it adds a check whether the
object at current path
is equal?
to that value.
When the list is processed, then technically the macro could expand
to the form (if condition (let bindings actions ...) alternative)
, but
since the matcher should handle repeated identifiers (with the requirement
that they all point to fragments of data that is equal?
),
it expands to the check/unique
macro, whose responsibility is
to add necessary equality checks, and remove bindings with duplicate identifiers.
(define-syntax match-clause (lambda (stx) (syntax-case stx (quote quasiquote unquote and _) ((match-clause () condition bindings actions ... alternative) #'(check/unique condition bindings #f () () actions ... alternative)) ((match-clause ((`,pattern root) . rest) condition bindings actions ... alternative) #'(match-clause ((pattern root) . rest) condition bindings actions ... alternative)) ((match-clause ((_ root) . rest) condition bindings actions ... alternative) #'(match-clause rest condition bindings actions ... alternative)) ((match-clause ((variable root) . rest) condition bindings actions ... alternative) (identifier? #'variable) #'(match-clause rest condition ((variable root) . bindings) actions ... alternative)) ((match-clause (('datum root) . rest) (and conditions ...) bindings actions ... alternative) #'(match-clause rest (and conditions ... (equal? root 'datum)) bindings actions ... alternative)) ((match-clause ((`(left . right) root) . rest) (and conditions ...) bindings actions ... alternative) #'(match-clause ((`left (car root)) (`right (cdr root)) . rest) (and conditions ... (pair? root)) bindings actions ... alternative)) ((match-clause ((literal root) . rest) (and conditions ...) bindings actions ...) #'(match-clause rest (and conditions ... (equal? literal root)) bindings actions ...)) )))
As mentioned earlier, the check/unique
macro is
responsible for extending condition
with equality
checks if there is more than one instance of a given identifier,
and to remove bindings
with duplicate identifiers.
In addition to bindings
, it maintains a currently
processed (variable path)
binding (or the #f
value in the edge cases), a list called bindings/checked
,
consisting of all bindings that didn't contain the identifier currently
being processed, and a list bindings/final
, consisting of
all bindings that were already tested against duplicates.
(define-syntax check/unique (lambda (stx) "add equality checks for repeated identifiers in patterns and remove them from bindings" (syntax-case stx (and) ((check/unique condition () #f () bindings actions ... alternative) #'(if condition (let bindings actions ...) alternative)) ((check/unique condition ((variable path) . bindings) #f bindings/checked bindings/final actions ... alternative) #'(check/unique condition bindings (variable path) bindings/checked bindings/final actions ... alternative)) ((check/unique (and conditions ...) ((variable path) . bindings) (variable+ path+) bindings/checked bindings/final actions ... alternative) (bound-identifier=? #'variable #'variable+) #'(check/unique (and conditions ... (equal? path path+)) bindings (variable+ path+) bindings/checked bindings/final actions ... alternative)) ((check/unique conditions ((variable path) . bindings) (variable+ path+) bindings/checked bindings/final actions ... alternative) #'(check/unique conditions bindings (variable+ path+) ((variable path) . bindings/checked) bindings/final actions ... alternative)) ((check/unique conditions () (variable path) bindings/checked bindings/final actions ... alternative) #'(check/unique conditions bindings/checked #f () ((variable path) . bindings/final) actions ... alternative)) )))
syntax-rules
The above definition of match-clause
only uses
syntax-case
macros because it needs to use the
identifier?
predicate in one pattern guard. Likewise,
the definition of the check/unique
needs to use
the bound-identifier=?
predicate in one pattern
guard.
The latter could be easily avoided if the syntax-rules
pattern language allowed to repeat identifiers in patterns. The case
((check/unique (and conditions ...) ((variable path) . bindings) (variable+ path+) bindings/checked bindings/final actions ... alternative) (bound-identifier=? #'variable #'variable+) #'(check/unique (and conditions ... (equal? path path+)) bindings (variable+ path+) bindings/checked bindings/final actions ... alternative))
could then be rewritten as
((check/unique (and conditions ...) ((variable path) . bindings) (variable path+) bindings/checked bindings/final actions ... alternative) #'(check/unique (and conditions ... (equal? path path+)) bindings (variable path+) bindings/checked bindings/final actions ... alternative))
and the whole macro could be easily transformed to
syntax-rules
Both singularities could also be resolved in an extended variant
of syntax-rules
that permits pattern guards.
However, it is also possible to avoid the need for pattern guards by applying a few tricks devised by Oleg Kiselyov.
The identifier?
predicate can be replaced with
the following identifier/literal
macro, which
expands to its second argument if its first argument is an
identifier, and otherwise expands to its third argument
(this style of writing macros is broadly called "CPS macros",
because it resembles the Continuation-Passing Style).
(define-syntax identifier/literal (syntax-rules () ((identifier/literal atom identifier literal) (let-syntax ((check-identifier (syntax-rules () ((_ atom symbol _) symbol) ((_ datum _ value) value)))) (check-identifier raw-symbol identifier literal)))))
The key trick is the definition of a local macro (using
the let-syntax
form) and then calling this locally
defined macro with some unbound identifier (in our case
we chose the name raw-symbol
).
As the local macro uses the atom
argument,
the first rule matches against the raw-symbol
only if atom
is also an identifier - then
the expression will be replaced by the successful branch. Otherwise
(e.g. if atom
is a list, or a number, or a boolean
etc.) the first rule will fail, and the second rule will be
tested which always matches, and the expression will be replaced
with the failure branch.
The second guard, consisting of the bound-identifier=?
predicate can be replaced using a very similar trick,
but this time the macro will take two arguments, and - as before
- two continuation arguments:
(define-syntax same-variable (syntax-rules () ((_ x y same differ) (let-syntax ((x= (syntax-rules (x) ((_ x identical _) identical) ((_ z _ different) different)))) (x= y same differ)))))
The code is almost exactly the same, with two important differences:
first, the identifier x
(first argument) appears on
the local macro's literals' list; second, instead of passing
an unbound symbol, we now pass the second argument of our macro,
namely - y
- to the inner macro.
The first pattern of the inner macro will match only if the second argument is the same identifier as the first one; otherwise the second rule succeeds which matches anything.
Those two macros allow us to construct the definitions for
match-clause
and check/unique
exclusively in terms of simple syntax-rules
transformations:
(define-syntax match-clause (syntax-rules (quote quasiquote unquote and _) ((match-clause () condition bindings actions ... alternative) (check/unique condition bindings #f () () actions ... alternative)) ((match-clause ((`,pattern root) . rest) condition bindings actions ... alternative) (match-clause ((pattern root) . rest) condition bindings actions ... alternative)) ((match-clause ((_ root) . rest) condition bindings actions ... alternative) (match-clause rest condition bindings actions ... alternative)) ((match-clause (('datum root) . rest) (and conditions ...) bindings actions ... alternative) (match-clause rest (and conditions ... (equal? root 'datum)) bindings actions ... alternative)) ((match-clause ((`(left . right) root) . rest) (and conditions ...) bindings actions ... alternative) (match-clause ((`left (car root)) (`right (cdr root)) . rest) (and conditions ... (pair? root)) bindings actions ... alternative)) ((match-clause ((atom root) . rest) (and conditions ...) bindings actions ... alternative) (identifier/literal atom (match-clause rest (and conditions ...) ((atom root) . bindings) actions ... alternative) (match-clause rest (and conditions ... (equal? atom root)) bindings actions ... alternative))) )) (define-syntax check/unique (syntax-rules (and) ((check/unique condition () #f () bindings actions ... alternative) (if condition (let bindings actions ...) alternative)) ((check/unique condition ((variable path) . bindings) #f bindings/checked bindings/final actions ... alternative) (check/unique condition bindings (variable path) bindings/checked bindings/final actions ... alternative)) ((check/unique (and conditions ...) ((variable path) . bindings) (variable+ path+) bindings/checked bindings/final actions ... alternative) (same-variable variable variable+ (check/unique (and conditions ... (equal? path path+)) bindings (variable+ path+) bindings/checked bindings/final actions ... alternative) (check/unique (and conditions ...) bindings (variable+ path+) ((variable path) . bindings/checked) bindings/final actions ... alternative))) ((check/unique conditions () (variable path) bindings/checked bindings/final actions ... alternative) (check/unique conditions bindings/checked #f () ((variable path) . bindings/final) actions ... alternative)) ))
This work has provoked the emergence of SRFI-204, which, due to its much greater scope and ambition, has never been finalized.
However, a lot of excellent work done by the author of SRFI-204, Felix Thibault, has been (indirectly) incorporated into this document. Felix' work prompted sections on Bigloo and Gerbil matchers, and most references were borrowed from his document.
The author's first exposure to pattern matching in Scheme occurred through the Shinn matcher that was shipped with Guile, along with excellent documentation. The author is therefore grateful to both Alex Shinn and the Guile team (at that time consisting of Mark H. Weaver, Ludovic Courtes and Andy Wingo).
Alex Shinn deserves additional credit for explaining to me the
tricks invented by Oleg Kiselyov that allow to express in
syntax-rules
things that otherwise would only be
expressible using pattern guards.
I am also grateful to everybody who contributed their time and effort on this SRFI's mailing list. In particular, I was constantly impressed by both the involvement and the depth of analysis shown by Marc Nieper-Wißkirchen and Shiro Kawai.
This work probably would never see the daylight if John Cowan didn't encourage me to participate in the SRFI process, and if Arthur Gleckler hadn't decided to carry on the torch of animating the SRFI process. He's probably the most patient SRFI editor the world could ever imagine.
Of course, it would not be possible to write a review of existing pattern matchers if they weren't existing, so this document owes its own existence to their respective creators.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.