by Felix Thibault
This SRFI is currently in withdrawn status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-204@nospamsrfi.schemers.org
. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.
Pattern matching decomposes a compound data structure into parts and assigns those parts to variables. This SRFI describes a pattern-matching library already in use by several scheme implementations which can match many common compound data structures.
Scheme has come with its own pattern-matching language since R4RS, as
part of syntax-rules
[Shinn, Cowan, Gleckler pp. 23-24]. Many implementations add a pattern
matcher that does not involve writing macros to make pattern matching
available to all users. Before R4RS, there was only the case
[Shinn, Cowan, Gleckler pp. 14-15]
statement, which does match, but does not bind its key. Bigloo1, STklos2
and s73 all use extended case
statements for pattern
matching.
Currently, the pattern-matching facilities that have been implemented by SRFIs are procedures of variable arity (SRFI 16) and S-expressions (SRFI 200). The focus of SRFI 200 is both consensus and clarity. That is, it aims to provide a pattern syntax that is both widely shared and easily read without requiring knowledge of any notation that is not ordinary Scheme. SRFI 200 also aims to provide a portable, readable way of using the pattern matchers already available in many implementations.
This SRFI aims to specify a matching syntax beyond S-expressions and provide a sample implementation, documentation, and tests. It is based on a pattern matcher originated by Andrew K. Wright and Robert Cartwright [Wright and Cartwright, pp. 116-120], using procedures and defmacros. Using a technique developed by Oleg Kiselyov [Kiselyov, 2003] Alex Shinn converted the pattern matcher to pure syntax-rules. This pattern-matching library (WCS) is already in use (at least in part) in several Scheme implementations. Aside from S-expressions, it also matches vectors and records. It has the ability to match based on predicates, to match "fields" (procedure applications), getters, and setters, and has syntax to match patterns zero or more, one or more, or exactly n times, or to walk down a tree with a pattern.
WCS is already part of Chibi4, Guile,5 Cyclone6, Chicken7, LispKit8, Loko9, Mosh10, and Sagittarius11. There are other pattern-matching libraries that are part of Armpit Scheme12, Gerbil13, Bigloo14, STklos15, the bindings library in Chicken16, Kawa17, Gambit18, Gauche19, Picrin20, Racket21, Scheme 9 from Empty Space22, and Scheje23. These matchers vary from very similar to the WCS matcher to S-expression matchers, type-oriented matchers, and expert systems more like ELIZA [Norvig, Chapter 5]. They also range from a wrapper around a library in the implementation language, part of the core in the implementation language, part of the core in Scheme, a Scheme library, and an example program. In length, they range from about twenty lines to thousands of lines across several files. Links to documentation and implementation (as available) are in the repository.
Starting with the WCS matcher, a portable syntax can be developed and a broad basis for implementing pattern matchers laid down. Some features of the current matcher are specified as optional and alternate means to achieve the same ends are specified. The current implementation, along with tests, documentation, and implementation-dependent code are in the repository.
Patterns are written to look like the printed representation of
the objects they match. The basic usage is
(match expr (pat body ...) ...)
where the result of expr
is matched against each pattern in
turn, and the corresponding body is evaluated for the first to
succeed. Thus, a list of three elements matches a list of three
elements.
(let ((ls (list 1 2 3)))
(match ls
((1 2 3) #t)))
=> #t
If no patterns match, an error is signalled. The forms match
, match-lambda
, and match-lambda*
all can match their expression(s) against multiple patterns, but match-let
, match-let*
, and match-letrec
match each expression against one pattern, so non-matching conditions have to be caught outside the match.
The following terms will be used in describing pattern matching:
match
form._, quote, $, struct, object, =, and, or, not, ?, set!, get!, quasiquote, ..., ___, **1, =.., *.. unquote, unquote-splicing
.$ struct object = and or not ? set! get! ... ___ **1 =.. *..
which are exported (if not already defined) via SRFI 206, if available.A pattern matcher has the following form:
(match <arg-expression> <clause> <clause> ... )
<clause> -> (<pattern> <body>) |
(<pattern> (=> <non-pattern identifier>) <body>)
Syntax: it is an error if <arg-expression> does not match a pattern in any of the clauses. It is an error for the <non-pattern identifier> to be a pattern variable.
A pattern can have one of the following forms:
<pattern> -> (<pattern> ...) |
#(<pattern> ...) |
<pattern identifier> |
<pattern s-expression> |
<tail pattern> |
<ellipsis pattern> |
<record pattern> |
<getter or setter pattern> |
<field pattern> |
<predicate pattern> |
<boolean pattern> |
<tree pattern> |
<quasi-quote pattern> |
_
Syntax: A <pattern identifier> is any identifier except and or not set! get! $ struct object _ ... ___
=..
***
*..
= ? quote quasi-quote unquote-splicing
and can be used as a pattern variable. A pattern variable will match any value, and can be repeated.
A <pattern s-expression> can be a boolean, number, character, vector, a (quote symbol) or (quote <pattern s-expression> ...). A <pattern s-expression> matches an <arg-expression> that is exactly like it. (A symbol in a quoted s-expression does not need a separate quote).
A tail pattern has the form:
<tail pattern> -> (<pattern> <pattern> ... . <pattern>) |
#(<pattern> <pattern> ... . <pattern>)
An ellipsis pattern has the form:
<ellipsis pattern> -> (<pattern> <pattern> ... <ellipsis form> <pattern> ...) |
#(<pattern> <pattern> ... <ellipsis form> <pattern> ...)
<ellipsis form> -> ... |
___ |
=.. <number> |
*.. <number 1> <number 2>
Syntax: It is an error for a list to contain more than one ellipsis form or tail pattern, but the patterns that list contains can contain additional ellipsis forms or tail patterns. It is an error for <number 2> to be less than <number 1>.
A record pattern has the form:
<record pattern> -> ($ <record type> <pattern> <pattern> ...) |
(struct <record type> <pattern> <pattern> ...) |
(object <record type> (<slot name> <pattern>) (<slot name> <pattern>) ...)
<record type> -> implementation dependent
<slot name> -> <identifier>
Syntax: It is an error for the number of patterns in the $ or struct forms to exceed the number of fields in the record. It is an error if the record does not have a field named <slot name>. If a record is exported from a library without its <record type>, this form cannot be used.
A getter or setter pattern has the form:
<getter or setter pattern> -> (get! <pattern identifier>) |
(set! <pattern identifier>)
A field pattern has the form:
<field pattern> -> (= <operator> <pattern>)
A predicate pattern has the form:
<predicate pattern> -> (? <predicate> <pattern> ...)
A boolean pattern has the form:
<boolean pattern> -> (and <pattern> ...) |
(or <pattern> ...) |
(not <pattern> <pattern> ...)
Syntax: It is an error for not patterns to be empty. It is an error to refer to the same variable inside and outside a not pattern.
A tree pattern has the form:
<tree pattern> -> (<pattern> *** <pattern>)
Syntax: It is an error to combine *** with ..., ___, ., *.., or =.. .
A quasi-quote pattern has the form:
<quasi-quote pattern> -> `<quasi-quote datum> |
(quasi-quote <quasi-quote datum>)
<quasi-quote datum> -> <pattern s-expression> |
<identifier> |
(<quasi-quote contents> ...)
<quasi-quote contents> -> <quasi-quote datum> |
,<pattern> |
(unquote <pattern>) |
,@<pattern> |
(unquote-splicing <pattern>)
Syntax: identifiers in unquoted patterns are subject to the same naming constraints as other pattern variables. ,@/unquote-splicing is a type of ellipsis pattern so it is a syntax error if a list contains this pattern in combination with tail, or other ellipsis patterns (=.. and *.. have to be unquoted to be interpreted as ellipsis patterns).
A _ pattern is a wildcard that will match anything but does not bind.
Helper forms:
(match-lambda <clause> <clause> ...)
(match-lambda* <clause> <clause> ...)
(match-let (<match-let clause> ...) <body>)
(match-let <identifier> (<match-let clause> ...) <body>)
(match-let\* (<match-let clause> ...) <body>)
(match-letrec (<match-let clause> ...) <body>)
<match-let clause> -> (<pattern> <arg-expression>) |
<binding spec>
Semantics: The use of match takes an expression and compares it to a series of patterns, beginning with the leftmost. When the expression matches a pattern, it evaluates the corresponding body, with any bindings formed in the pattern. A failure pattern assigns a zero-argument function to a procedure so that if it is determined in the body that the match actually failed, that procedure can be called to restart matching at the next argument.
A pattern identifier (which is any identifier except and or not set! get! $ struct object _ ... ___ =.. *** *.. = ? quote quasi-quote unquote-splicing
) will match and bind anything, and bind it to that pattern variable. The special identifier _ matches anything but does not bind. The forms (<pattern> ...) and #(<pattern> ...) match lists and vectors whose elements match the contained patterns (() and #() match the empty list and vector).
A pattern s-expression matches literal values.
A tail pattern matches the rest of a list or vector.
A pattern followed by ellipsis matches zero or more times. A pattern followed by =.. k
matches k times. A pattern followed by *.. k j
matches between k and j times (inclusive).
The $
and struct
syntax of record patterns match record fields positionally. The object
record pattern matches an identifier against the record field slot names. This pattern may not be available (because an implementation has not made the necessary record introspection forms available) or may not be available for a particular record (because the record type is not exported).
Getter and setter functions bind a zero- and one-argument function to their identifier. The getter function returns the value of its match when called, and the setter function will take a new value to set its match to.
A field pattern (= <operator> <pattern> ...)
applies operator (a one argument function) to what it is matching, and then matches the return value against its pattern. A predicate pattern (? <predicate> <pattern>)
is similar in that it applies its predicate to a value, but if the predicate is true, that value is then matched against all additional patterns.
A boolean pattern has one of the forms (and <pattern> ...), (or <pattern> ...), (not <pattern> <pattern> ...)
. All of the forms in an and pattern must match, and an empty and pattern is true. At least one of the forms in the or pattern must match, and an empty or is false. None of the forms in a not pattern should match and an empty not is a syntax error.
In a tree pattern (<left pattern> *** <right pattern>), the /<left pattern> represents a path to the subtree that matches the <right pattern>.
In a quasi-quote pattern, <pattern s-expressions> and <identifiers> match their literal values (as symbols in the case of <identifiers>). Unquoted <pattern>s match as usual, and unquote-spliced <pattern>s have ellipsis matching.
match-lambda
produces a one-argument lambda expression that matches its argument. match-lambda*
makes a lambda expression that accepts any number of arguments and matches against a list of those arguments.
match-let
match-let*
and match-letrec
both allow binding a name to a value through either standard let/letrec syntax or by matching a pattern against a value.
pat : patvar ;; anything, and binds patvar | _ ;; anything | () ;; the empty list | #t ;; #t | #f ;; #f | string ;; a string | number ;; a number | character ;; a character | 'sexp ;; an s-expression | 'symbol ;; a symbol (special case of s-expr) | (pat1 ... patN) ;; list of n elements | (pat1 ... patN . patN+1) ;; list of n or more | (pat1 ... patN patN+1 ooo) ;; list of n or more, each element ;; of remainder must match patN+1 | (pat1 ... patN patN+1 ooo patN+2 ... patM) ;; list of m or more, every element between the ;; nth and the (m or more - n)th must match ;; patN+1. | #(pat1 ... patN) ;; vector of n elements | #(pat1 ... patN patN+1 ooo) ;; vector of n or more, each element ;; of remainder must match patN+1 | #(pat1 ... patN patN+1 ooo patN+2 ... patM) ;; vector of m or more, every element between the ;; nth and the (m or more - n)th must match ;; patN+1. | ($ record-type pat1 ... patN) ;; a record (patK matches in slot ;; order) optional | (struct struct-type pat1 ... patN) ;; ditto (*) optional | (object struct-type (slot1 pat1) ...) ;; a record (using slot names) (*) ;; optional | (= proc pat) ;; apply proc, match the result to pat | (and pat ...) ;; if all of pats match | (or pat ...) ;; if any of pats match | (not pat ...) ;; if no pats match | (? predicate pat ...) ;; if predicate true and all pats match | (set! patvar) ;; anything, and binds setter | (get! patvar) ;; anything, and binds getter | (pat1 *** pat2) ;; tree subpattern (*) | `qp ;; a quasi-pattern (*) extended syntax not originally part of Wright-Cartwright pattern matcher. patvar : an identifier except _, quote, $, struct, @, object, =, and, or, not, ?, set!, get!, quasiquote, ..., ___, **1, =.., *.. unquote, unquote-splicing. ooo : ... ;; zero or more | ___ ;; zero or more | **1 ;; one or more | =.. k ;; exactly k where k is an integer. (*) ;; Example: =.. 1, =.. 2 ... | *.. k j ;; between k and j, where k and j are (*) ;; integers. Example: *.. 3 4, match 3 ;; or 4 of a pattern *.. 1 5 match from ;; 1 to 5 of a pattern (*) extended syntax not originally part of Wright-Cartwright pattern matcher. Note: a list can contain only one ellipsis pattern which includes the patterns above, tail patterns, and unquote-splicing patterns. qp : () ;; the empty list | #t ;; #t | #f ;; #f | string ;; a string | number ;; a number | character ;; a character | identifier ;; a symbol | (qp_1 ... qp_n) ;; list of n elements | (qp_1 ... qp_n . qp_{n+1}) ;; list of n or more | (qp_1 ... qp_n qp_n+1 ooo) ;; list of n or more, each element ;; of remainder must match qp_n+1 | #(qp_1 ... qp_n) ;; vector of n elements | #(qp_1 ... qp_n qp_n+1 ooo) ;; vector of n or more, each element ;; of remainder must match qp_n+1 | ,pat ;; a pattern | ,@pat ;; a pattern
If a pattern matcher is in tail context, expressions marked as tail-bodies below will be in tail context.
(match <expression> <match-clause>+)
(match-lambda <match-clause>+)
(match-lambda* <match-clause>+)
(match-let ((<pattern> <expression>)) <tail-body>+)
(match-letrec ((<pattern> <expression>)) <tail-body>+)
(match-let <variable> ((<pattern> <expression>)) <tail-body>+)
(match-let* ((<pattern> <expression>)) <tail-body>+)
where:
<match-clause> → (<pattern> <tail-body>+)
There are two types of patterns where the issue of side-effects is prominent:
The following is a list of examples of the erroneous conditions that cause an error to be signaled and the message given by the current implementation:
(match) "missing match expression"
(match (list 1 2 3)) "no match clauses"
(match (list 1 2 3) ((a b))) "no matching pattern"
(match (list 1 2 3) ((a *** . 3) a)) "invalid use of ***"
(match '(1 1 1 2 2 2) ((a ... b ...) b)) "multiple ellipsis patterns not allowed at same level"
(match '(1 1 1 2 2 2) ((a =.. 3 b ...) b)) ditto
(match '(1 1 1 2 2 2) (`(,@a ,b ...) b)) ditto
(match '(1 1 1 2 2 2) (`(,@a . b) a)) "dotted tail not allowed after ellipsis"
There are also three conditions that are errors and may return a value:
(match '(1 2 3) ((a b c)))
(match '((1 2 3) (1 2 3)) (((a ...) a) a))
(match '(1 2) ((a (not a)) a))
Other macros that use the pattern matcher will be subject to the same constraints, in particular with respect to names. For example:
(define-syntax make-chunker
(syntax-rules ()
((_ s ...)
(lambda (l)
(let lp ((l l))
(match l (() '())
((s ... . rest) (cons (list s ...) (lp rest)))
(end (list end))))))))
((make-chunker a b c d) (iota 20))
=> ((0 1 2 3) (4 5 6 7) (8 9 10 11) (12 13 14 15) (16 17 18 19))
((make-chunker a b c _) (iota 20))
=> ERROR on line 16: invalid use of syntax as value: _
((make-chunker a b c ___) (iota 20))
=> ERROR on line 715 of file ./srfi-204/srfi-204.scm:
no expansion for:
(match-syntax-error "dotted tail not allowed after ellipsis" rest)
A pattern is an expression made up of pattern operators, literals, and identifiers, which are interpreted as pattern variables. A quasi-pattern is a type of pattern that is quasi-quoted so it is made up of literals, unquoted identifiers, and unquoted operator expressions. Both types of patterns can be used in the sub-forms below, with some caveats for quasi-patterns.
Literal patterns are patterns like the example above, where the content of the pattern is only quoted or self-quoting data. In this case, quasi-patterns and patterns are very similar (except symbols don't need quoting in quasi-patterns):
(let ((ls (list 'a "b" #f 2 '() #\c '#(1))))
(list (match ls (('a "b" #f 2 () #\c #(1)) 'ok))
(match ls (`(a "b" #f 2 () #\c #(1)) 'ok))))
=> (ok ok)
The above expression will not work for R6RS, because the vector literal [#(1)
] in ls
is not quoted. Note also that the empty list does not need to be quoted in either type of pattern.
At the time of this writing, cyclic literals seem irregularly implemented, at least in the repl, but this is one area where quasi-patterns and other literal patterns differ, since cyclic literals are illegal in quasiquote (R7RS, Sect. 2.4) and will lead to some kind of error/crash when they occur in quasi-patterns.
Identifiers in patterns that are not one of the operators (_, quote, $, struct, @, object, =, and, or, not, ?, set!, get!, quasiquote, ..., ___, **1, =.., *.. unquote, unquote-splicing
) will match anything and their corresponding binding is available in the body. The special variable _ matches anything and does not bind the result. It can be used to get rid of unintersting values or in a block of match expressions, _ is often used like else
if the result is not needed. Pattern variables have to be unquoted in quasi-patterns.
(match (list 1 2 3) ((a b c) b)) => 2
(match (list 1 2 3) ((_ b _) b)) => 2
(match (list 1 2 3) (`(a ,b c) b) (_ 'fail)) => fail
(match (list 1 2 3) (`(1 ,b ,_) b) (_ 'fail)) => 2
A nonlinear pattern has repeated pattern variables:
(match (list 'A 'B 'A) ((a b a) a) (_ 'fail)) => A
(match (list 'A 'B 'A) (`(,a b ,a) a) (_ 'fail)) => fail
(match (list 'A 'B 'A) (`(,a B ,a) a) (_ 'fail)) => A
(match (list 'A 'B 'A) (`(,a ,b ,a) a) (_ 'fail)) => A
After the first match, subsequent instances must match a value equal?
to the first. Any nonlinear pattern clause that involves repetition can be converted into a linear pattern clause that uses failure to restart pattern matching at the next pattern:
(match (list 1 2 1) ((a b a) a) (_ 'fail))
converts to
(match (list 1 2 1) ((a b c) (=> fail)
(if (equal? a c) a (fail)))
(_ 'fail))
If an implementation chooses not to support repetition, only the second pattern is valid and the first (as well as those in the first part of this section) should cause a syntax error.
The first type of ellipsis pattern is just the ellipsis ...
, which is equivalent to the unquote-splicing in quasi-patterns. An ellipsis will match any number (zero or more) of a pattern (like a regex Kleene star):
(match (list 1 2) ((1 2 3 ...) #t)) => #t
(match (list 1 2) (`(1 2 ,@3) #t)) => #t
(match (list 1 2 3) ((1 2 3 ...) #t)) => #t
(match (list 1 2 3) (`(1 2 ,@3) #t)) => #t
(match (list 1 2 3 3 3) ((1 2 3 ...) #t)) => #t
(match (list 1 2 3 3 3) (`(1 2 ,@3) #t)) => #t
(match '((a time) (stitch saves) (in nine)) (((x y) ...) (list x y)))
=> ((a stitch in) (time saves nine))
(match '((a b) (c d) (e f)) (`(,@(x y)) (list x y))) => ((a c e) (b d f))
There can be only one ellipsis or tail pattern per list, but there can be multiple ellipsis or tail patterns in a pattern:
;;;[List-of [List-of Any]] -> [List-of [List-of Any]]
(define transpose
(match-lambda (((a b ...) ...) (cons a (transpose b))) (_ '())))
(transpose '((1 2 3) (4 5 6))) => ((1 4) (2 5) (3 6))
Ellipsis patterns also do not have to be the last pattern in a list:
(define (palindrome? str)
(let loop ((chars (filter char-alphabetic?
(string->list (string-foldcase str)))))
(match chars
(() #t)
((a) #t)
((a b ... a) (loop b))
(_ #f))))
(palindrome? "Able was I, ere I saw Elba.") => #t
(palindrome? "Napoleon") => #f
and they can be used with _ to throw away unimportant data:
(define first-column
(match-lambda (((a _ ...) ...) a)))
(first-column '((1 2 3) (4 5 6) (7 8 9))) => (1 4 7)
___
is provided for when it is
inconvenient to use the ellipsis (as in a syntax-rules template).
(match x (((a ...) a) a))
because the semantics are not clear.
The second kind of ellipsis is **1
, which matches one or more of a pattern (like regex +)
(match (list 1 2) ((a b c **1) c))
ERROR: match: "no matching pattern"
(match (list 1 2 3) ((a b c **1) c)) => (3)
This pattern (and the remaining ellipsis patterns) has no quasi-pattern equivalent, so if it occurs in a quasi-pattern that quasi-pattern must be unquoted:
(define first-column-of-some
(match-lambda (`(,@(a _ **1)) a)))
(first-column-of-some '((1) (2)))
ERROR: match: "no matching pattern"
(first-column-of-some '((1 2) (3 4))) => (1 3)
The =.. k
syntax matches exactly k of a pattern:
(match '((a b) (c d) (e f))
(((x y) =.. 3) (list x y))
(_ 'fail))
=> ((a c e) (b d f))
(match '((a b) (c d) (e f) (g h))
(((x y) =.. 3) (list x y))
(_ 'fail))
=> fail
The *.. k j
syntax matches between k and j of a pattern:
(match '((a b) (c d) (e f))
(((x y) *.. 2 4) (list x y))
(_ 'fail))
=> ((a c e) (b d f))
(match '((a b) (c d) (e f) (g h))
(((x y) *.. 2 4) (list x y))
(_ 'fail))
=> ((a c e g) (b d f h))
(match '((a b) (c d) (e f) (g h) (i j))
(((x y) *.. 2 4) (list x y))
(_ 'fail))
=> fail
Tail patterns are similar to ellipsis patterns in that they can take up several values, but they are different in a few respects:
Tail patterns will match dotted pairs, and ellipsis patterns won't:
(define keys
(match-lambda (((a _ ...) ...) a) (_ 'fail)))
(keys '((a 1) (b 2) (c 3))) => (a b c)
(keys '((a . 1) (b . 2) (c . 3))) => fail
(define keys
(match-lambda (((a . _) ...) a) (_ 'fail)))
(keys '((a 1) (b 2) (c 3))) => (a b c)
(keys '((a . 1) (b . 2) (c . 3))) => (a b c)
Tail patterns also don't require any special treatment in quasi-patterns
(define handle-arithmetic-sexpr
(match-lambda (`(+ . ,operands) (apply + (map eval-sexpr operands)))
(`(- . ,operands) (apply - (map eval-sexpr operands)))
(`(* . ,operands) (apply * (map eval-sexpr operands)))
(`(/ . ,operands) (apply / (map eval-sexpr operands)))))
Note: tree patterns are considered experimental at this stage and so are an
optional part of this SRFI
The new operator ***
can be used to search a tree for
subpatterns. A pattern of the form (x *** y)
represents
the subpattern y
located somewhere in a tree where the path
from the current object to y
can be seen as a list of the
form (x ...)
. y
can immediately match the current
object in which case the path is the empty list. In a sense it's
a 2-dimensional version of the ...
pattern.
As a common case the pattern (_ *** y)
can be used to
search for y
anywhere in a tree, regardless of the path
used.
(match '(+ (* (+ 7 2) (/ 5 4)) (sqrt (+ (sqr x) (sqr y))))
((a *** 7) a))
=> (+ * +)
(match '(+ (* (+ 7 2) (/ 5 4)) (sqrt (+ (sqr x) (sqr y))))
((_ *** `(sqrt . ,rest)) rest))
=> ((+ (sqr x) (sqr y)))
Other examples of tree patterns that handle mixed data types or retrieve all matches are given in the predicate and extension sections.
The boolean operators and
, or
and not
can be used to group and negate patterns analogously to their
Scheme counterparts.
The and
operator ensures that all subpatterns match.
This operator is often used with the idiom (and x pat)
to
bind x
to the entire value that matches pat
(c.f. "as-patterns" in ML or Haskell). Another common use is in
conjunction with not
patterns to match a general case
with certain exceptions.
(match 1 ((and) #t)) => #t
(match 1 ((and x) x)) => 1
(match 1 ((and x 1) x)) => 1
An and pattern is different from its Scheme counter part in that matching, not truth, is what determines whether it succeeds:
(match #f ((and) #t) (_ #f)) => #t
Failure is one way to catch when a value has matched false
(match #f ((and x) (=> fail) (if x #t (fail))) (_ #f)) => #f
The or
operator ensures that at least one subpattern
matches. If the same identifier occurs in different subpatterns,
it is matched independently. All identifiers from all subpatterns
are bound if the or
operator matches, but the binding is
only defined for identifiers from the subpattern which matched.
(match 1 ((or) #t) (else #f)) => #f
(match 1 ((or x) x)) => 1
(match 1 ((or x 2) x)) => 1
If failure patterns are used as replacements for or
repetition patterns, the or
should be moved from the pattern to the body:
(define last-matches-one-of-first-three
(match-lambda ((a a) #t)
((a b c ... (or a b)) #t)
((a b c d ... c) #t)
(_ #f)))
(define last-matches-one-of-first-three
(match-lambda ((a a) #t)
((a b c ... d)
(=> fail)
(if (or (equal? d a) (equal? d b))
#t
(fail)))
((a b c d ... e) (equal? c e))
(_ #f)))
Since all variables in an or
pattern are bound but only one is defined, using or
in combination with ellipsis patterns (as below) can lead to results with large numbers of undefined values:
(match (get-environment-variables)
(((or ("PATH" . path)
("HOMEPROFILE" . home)
("HOME" . home)
("USER" . user)
("USERNAME" . user)
(_ . _)) ...)
(list path home user)))
=> [three large lists of mostly undefined values]
One way to get the intended results is shown below:
(fold (match-lambda*
((("PATH" . path) (p h u)) (list path h u))
((("USERPROFILE" . home) (p h u)) (list p home u))
((("HOME" . home) (p h u)) (list p home u))
((("USER" . user) (p h u)) (list p h user))
((("USERNAME" . user) (p h u)) (list p h user))
(_ out) out)
(list #f #f #f) ; (p h u) init
(get-environment-variables))
=> [List of 3 strings]
In other cases, it may make sense to use ((or pats) ...)
patterns with clean-up:
(define (clean lst)
(let ((undef (when #f #f)))
(remove (lambda (item) (equal? item undef)) lst)))
(match (iota 7)
(((or 2 6 rest) ...) (clean rest)))
=> (0 1 3 4 5)
The not
operator succeeds if the given pattern doesn't
match. None of the identifiers used are available in the body. This pattern also gives us a way to eliminate values (like false above) in a more streamlined way:
(match 1 ((and x (not #f)) x) (_ 'fail)) => 1
(match #f ((and x (not #f)) x) (_ 'fail)) => fail
(match 1 ((not 2) #t)) => #t
It is an error for a pattern to refer to the same variable inside and outside of a not pattern as in (match x ((a (not a)) a)).
The operator ?
can be used to provide a
predicate. The usage is (? predicate pat ...)
where
predicate
is a Scheme expression evaluating to a predicate
called on the value to match, and any optional patterns after the
predicate are then matched as in an and
pattern.
(match 1 ((? odd? x) x)) => 1
Along with boolean patterns, the predicate pattern is enough to make the other half of the math expression evaluator started in the section on dotted pairs:
(define eval-sexpr
(match-lambda ((? number? n) n)
((and pair ((or '+ '- '* '/) . rest))
(handle-arithmetic-sexpr pair))
(_ (error "not implemented yet"))))
(eval-sexpr '(+ (* 3 4 5) (- 10 3))) => 67
Predicates also can be used to make a tree pattern that finds all matches:
(define extract-num-addends
(match-lambda
(((and (k *** `(+ . ,addends)) ('+ (? number? i) ...)) . rest)
(cons addends (extract-num-addends rest)))
(((and (k *** `(+ . ,addends)) inner) . rest)
(append (extract-num-addends inner)
(extract-num-addends rest)))
((this . rest) (extract-num-addends rest))
(() '())))
(extract-num-addends '((+ (* 1 (+ 2 3)) (+ 4 5)) (- (/ 6 (+ 7 8)) (+ 9 10))))
=> ((2 3) (4 5) (7 8) (9 10))
or do tree matching on heterogenous data. This example uses the following JSON, which is converted by several packages into the subsequent list:
{"menu": {
"id": "file",
"value": "File",
"popup": {
"menuitem": [
{"value": "New", "onclick": "CreateNewDoc()"},
{"value": "Open", "onclick": "OpenDoc()"},
{"value": "Close", "onclick": "CloseDoc()"}
]
}
}}
;; in scheme
(define example-json
'((menu (id . "file")
(value . "File")
(popup
(menuitem .
#(((value . "New") (onclick . " CreateNewDoc()"))
((value . "Open") (onclick . "OpenDoc()"))
((value . "Close") (onclick . "CloseDoc()"))))))))
;; vector-index is from (srfi 133)/(scheme red)
(define (get-close json)
(define get-close-inner
(match-lambda
((key *** ('(value . "Close") . rest)) key)
((? vector? v)
(let ((i (vector-index get-close-inner v)))
(if i
(cons i (get-close-inner (vector-ref v i)))
#f)))
((key *** (k . (? vector? v)))
(let ((r (get-close-inner v)))
(if r
(append key (cons k r))
#f)))
(_ #f)))
(get-close-inner (car json)))
(get-close example-json)
=> (menu popup menuitem 2)
A predicate may be called multiple times on matching and non-matching expressions and sub-expressions, in any order, so side effects should be avoided unless there is some overriding interest (debugging, logging, short-cutting, etc.). Predicates can also be written using references to preceding pattern variables, if repetition is supported:
(define fibby?
(match-lambda ((a b (? (lambda (x) (= (+ a b) x)) c) . rest)
(fibby? (cons b (cons c rest))))
((a b) #t)
((a) #t)
(() #t)
(_ #f)))
;;; doesn't refer to pattern variables in predicate
(define fibby?
(match-lambda ((a b c . rest)
(if (= (+ a b) c)
(fibby? (cons b (cons c rest)))
#f))
((a b) #t)
((a) #t)
(() #t)
(_ #f)))
(fibby? '(4 7 11 18 29 47)) => #t
The field operator =
is used to extract an arbitrary
field and match against it. It is useful for more complex or
conditional destructuring that can't be more directly expressed in
the pattern syntax. The usage is (= field pat)
, where
field
can be any expression, and should result in a
procedure of one argument, which is applied to the value to match
to generate a new value to match against pat
.
Thus the pattern (and (= car x) (= cdr y))
is equivalent
to (x . y)
except that it will result in an immediate error
if the value isn't a pair. Like predicates, fields should avoid side effects when there is no overriding interest.
In contrast to the predicate operator, the field operator succeeds
when its value is false. Compare:
(match 1 ((and n (? even?)) n) (_ 'fail)) => fail
(match 1 ((and n (= even? r)) (list n r)) (_ 'fail)) => (1 #f)
(match '(a b c d) ((or (= (lambda (x) (memq 'f x)) r)
(= (lambda (x) (memq 'g x)) r)
(= (lambda (x) (memq 'b x)) r))
r)
(_ 'fail))
=> #f
(match '(a b c d) ((or (= (lambda (x) (memq 'f x)) (and r (not #f)))
(= (lambda (x) (memq 'g x)) (and r (not #f)))
(= (lambda (x) (memq 'b x)) (and r (not #f))))
r)
(_ 'fail))
=> (b c d)
(match '(1 . 2) ((= car x) x)) => 1
(match 4 ((= square x) x)) => 16
The set!
and get!
operators are used to bind an
identifier to the setter and getter of a field, respectively. The
setter is a procedure of one argument, which mutates the field to
that argument. The getter is a procedure of no arguments which
returns the current value of the field.
(let ((x (cons 1 2)))
(match x
((1 . (set! s)) (s 3) x)))
=> (1 . 3)
(match '(1 . 2) ((1 . (get! g)) (g))) => 2
(define alist (list (cons 'a 1) (cons 'b 2) (cons 'c 3)))
(define get-c (match alist ((= (lambda (al) (assv 'c al))
(_ . (get! g)))
g)))
(get-c) => 3
(define set-c! (match alist ((= (lambda (al) (assv 'c al))
(_ . (set! s)))
s)))
(set-c! 7)
(get-c) => 7
alist => ((a . 1) (b . 2) (c . 7))
Note: record introspection is not currently portable, so record operators are an interface whose implementation will not be portable. This is an optional part of the SRFI. See the implementation section for more information.
The record operator $
is used as a concise way to match
records defined by SRFI 9 (or SRFI 99). The usage is
($ rtd field ...)
, where rtd
should be the record
type descriptor specified as the first argument to
define-record-type
, and each field
is a subpattern
matched against the fields of the record in order. Not all fields
need be present. The operator struct
can be used as a synonym for $
.
(let ()
(define-record-type employee
(make-employee name title)
employee?
(name get-name)
(title get-title))
(match (make-employee "Bob" "Doctor")
(($ employee n t) (list t n))))
=> ("Doctor" "Bob")
For records with more fields it can be helpful to match them by
name rather than position. For this you can use the object
operator, originally a Gauche extension:
(let ()
(define-record-type employee
(make-employee name title)
employee?
(name get-name)
(title get-title))
(match (make-employee "Bob" "Doctor")
((object employee (title t) (name n)) (list t n))))
=> ("Doctor" "Bob")
Matching on records when this feature is not implemented can be done as follows:
(match (make-employee "Bob" "Doctor")
((and (? employee?)
(= get-title t)
(= get-name n))
(list t n)))
=> ("Doctor" "Bob")
Emulating setter patterns like:
(let () (define-record-type <posn>
(make-posn x y)
posn?
(x posn-x set-posn-x!)
(y posn-y set-posn-y!))
(match (make-posn 3 4)
((and p ($ <posn> (set! set-x)))
(set-x 7)
(match p (($ <posn> x y) (list x y))))))
=> (7 4)
is described in the next section.
For data types or matching not covered by equal?
, SRFI 26 in combination with user-defined procedures, predicates, and fields can be useful.
For example, SRFI 111 (boxes) is opaque to record pattern matching, but can be handled as below:
(define (box-equal? a b)
(if (and (box? a) (box? b))
(box-equal? (unbox a) (unbox b))
(equal? a b)))
;r6rs
(match (list (box 1) (box 1))
((a a) 'ok) (_ 'fail))
=> fail
(match (list (box 1) (box 1))
((a (? (cut box-equal? a <>))) 'ok)
(_ 'fail))
=> ok
;case for no nonlinear predicates
(match (list (box 1) (box 1))
((a b) (if (box-equal? a b) 'ok 'fail)))
=> ok
(match (box 1) ((= unbox value) value))
=> 1
(define box-1 (box 1))
(define make-get-value
(match-lambda
((and (? box?)
(= (lambda (box) (cut unbox box)) get))
get)))
(define make-set-value
(match-lambda
((and (? box?)
(= (lambda (box) (cut set-box! box <>)) set))
set)))
(define get-value (make-get-value box-1))
(define set-value! (make-set-value box-1))
(get-value) => 1
(set-value! 18)
(get-value) => 18
Another approach is to convert the data type to a form that can be matched against, analagous to (match (proc data) (pat ...))
.
Below is an example using a one procedure to match imports, and another to read
objects from a file to be matched:
(define extract-imports
(match-lambda
(`(import . ,imports) imports)
(((and (key *** `(import . ,imports)) inner) . rest)
(append (if (null? key)
(list imports)
(extract-imports inner)) (extract-imports rest)))
((this . rest) (extract-imports rest))
(any '())))
(define (match-on-file filename matcher)
(call-with-input-file
filename
(lambda (port)
(let loop ((out '()))
(if (eof-object? (peek-char port))
out
(loop (append out (matcher (read port)))))))))
;; filename relative to top of SRFI 204 repo
(match-on-file "test/data/forum-topics.scm" extract-imports)
=> (((srfi srfi-9))
((scheme case-lambda))
((srfi 69))
((srfi 204) (scheme red) (chibi json))
((srfi 69)))
A future SRFI will specify a method similar to this one using the pattern operators ->
and =>
to set up views on data.
A Sample implementation is available on Github and in this .tgz file.
Any implementation of SRFI 204 should use SRFI 206 to export whatever of the auxiliary syntax needs to be defined (from $, ?, ***, ___, **1, =.., *.., get!, struct, object
) or the included auxiliary-syntax.scm file, if SRFI 206 is not available. SRFI 206 is used as follows:
(import (only (srfi 206 all) $ ? *** ___ **1 =.. etc.))
and auxiliary-syntax.scm is used as follows:
;; from Akku
(import (srfi private include))
(begin
(include/resolve ("srfi" "204") "auxiliary-syntax.scm")
(define-auxiliary-keywords $ ? *** ___ **1 =.. etc.)
...)
;; in Guile
(begin
(include-from-path "srfi/204/auxiliary-syntax.scm")
(define-auxiliary-keywords $ ? *** ___ **1 =.. etc.)
...)
To use the $,
struct
, and object
syntax, an implementation needs to provide the forms is-a?
, slot-ref
, and slot-set!
. These forms have the following syntax:
(is-a? rec rtd) => #t if type of rec matches rtd, #f otherwise
(slot-ref rtd rec slot) => If slot is an integer, value at position slot,
if slot is a symbol, value named slot
(slot-set! rtd rec slot new-value) If slot is an integer, set! the value at
position slot to new-value. If slot is a symbol,
set! the value named slot to new-value
Since these forms rely on the underlying structure of records in a particular implementation, their definition will vary from one Scheme to another. This is an example from Gauche, where records are implemented on top of native CLOS-like objects:
(import (only (gauche base) is-a? slot-definition-name class-slots)
(scheme base)
(rename (gauche base)
(slot-ref gb-slot-ref)
(slot-set! gb-slot-set!)))
(begin
(define-syntax slot-ref
(syntax-rules ()
((_ class inst n)
(if (integer? n)
(gb-slot-ref inst
(list-ref (map slot-definition-name
(class-slots class))
n))
(gb-slot-ref inst n)))))
(define-syntax slot-set!
(syntax-rules ()
((_ class inst n value)
(if (integer? n)
(gb-slot-set! inst
(list-ref (map slot-definition-name
(class-slots class))
n)
value)
(gb-slot-set! inst n value))))))
this is an example from Larceny, which has SRFI 99:
(import (scheme base)
(srfi 99 records))
(begin
(define-syntax is-a?
(syntax-rules ()
((_ rec rtd)
((rtd-predicate rtd) rec))))
(define-syntax slot-ref
(syntax-rules ()
((_ rtd rec n)
(if (integer? n)
((rtd-accessor rtd (vector-ref (rtd-all-field-names rtd) n)) rec)
((rtd-accessor rtd n) rec)))))
(define-syntax slot-set!
(syntax-rules ()
((_ rtd rec n value)
(if (integer? n)
((rtd-mutator rtd (vector-ref (rtd-all-field-names rtd) n)) rec value)
((rtd-mutator rtd n) rec value))))))
and this is an example from Loko and Chez, which use forms from
(rnrs records procedural)
and (rnrs records inspection)
:
(define-syntax is-a?
(syntax-rules ()
((_ rec rtd)
((record-predicate (record-type-descriptor rtd)) rec))))
(define-syntax slot-ref
(syntax-rules ()
((_ rtd rec n)
(let ((rtd (record-type-descriptor rtd)))
(if (integer? n)
((record-accessor rtd n) rec)
((record-accessor rtd (name->idx rtd n)) rec))))))
(define-syntax slot-set!
(syntax-rules ()
((_ rtd rec n value)
(let ((rtd (record-type-descriptor rtd)))
(if (integer? n)
((record-mutator rtd n) rec value)
((record-mutator rtd (name->idx rtd n)) rec value))))))
(define-syntax name->idx
(syntax-rules ()
((_ rtd n)
(let* ((names (record-type-field-names rtd))
(len (vector-length names)))
(let lp ((i 0))
(cond
((> i len) (error "name not in record" n))
((eq? n (vector-ref names i)) i)
(else (lp (+ i 1 )))))))))
match-letrec
uses the technique described in [Kiselyov, 2013]. This form gets used the least by users, so it may be a useful exercise for implementors to look for internal defines and other items that can be translated into match-letrec to build up a testing library.
Alex Shinn, John Cowan, Arthur A. Gleckler, et al. 2013 Revised7 Report on the Algorithmic Language Scheme. Retrieved from https://small.r7rs.org/attachment/r7rs.pdf.
Bill Schottstaedt. 2020 s7 case.scm. Retrieved from https://ccrma.stanford.edu/software/snd/snd/s7.html#case.
Andrew K. Wright & Robert Cartwright. ACM Transactions on Programming Languages and Systems. Vol. 19, No. 1. January 1997. Pages 87-152. Retrieved on July 15, 2020 from https://www.iro.umontreal.ca/~feeley/cours/ift6232/doc/pres2/practical-soft-type-system-for-scheme.pdf.
Kiselyov, Oleg 2003 "How to write symbol? with syntax rules" comp.lang.scheme Retrieved from https://okmij.org/ftp/Scheme/macro-symbol-p.txt
Kiselyov, Oleg 2013 "How to Write Seemingly Unhygenic and Referentially Opaque Macros with Syntax-rules*" Retrieved from https://okmij.org/ftp/Scheme/Dirty-Macros.pdf
Alex Shinn. 2020. (chibi match). Retrieved from http://synthcode.com/scheme/chibi/lib/chibi/match.html.
Justin Ethier. 2019. Match Library. Retrieved from http://justinethier.github.io/cyclone/docs/api/cyclone/match.
Ludovic Cortès, Bake Timmons, Arun Isaac, & Paul Morris. 2019. Guile Reference Manual.. Sect. 7.8 "Pattern Matching." Retrieved from https://www.gnu.org/software/guile/manual/html_node/Pattern-Matching.html.
Matthias Zenger. 2019. match.scm (source file). Retrieved from https://github.com/objecthub/swift-lispkit/blob/master/Sources/LispKit/Resources/Libraries/lispkit/match.sld.
Göran Weinholt. 2019. match.sls (source file). Retrived from https://gitlab.com/weinholt/loko/-/blob/master/lib/match.sls.
Higepon Taro Minowa, et. al. 2010 Pattern Match. Retrieved from http://mosh.monaos.org/files/lib/match-ss.html#Pattern_Match.
Takashi Kato. 2019. Sagittarius Users' Reference. Sect. 8.1 "(match) -- Pattern Matching" Retrieved from http://ktakashi.github.io/sagittarius-online-ref/section81.html.
Hubert Montas. 2018. A Scheme Interpreter for ARM Microcontrollers. Sect. "Program Examples for Version 080 - Expert System" Retrieved from http://armpit.sourceforge.net/v080/aps_080_common_examples.html#Expert.
Manuel Serrano & Jean-Marie Geffroy. 2013. Bigloo/manual. Chapter 6. "Pattern Matching" Retrieved from http://www-sop.inria.fr/mimosa/fp/Bigloo/manual-chapter6.html.
R. Kent Dybig. 2010. "Using Match." Retrieved from Indiana University Bloomington via archive.org: https://web.archive.org/web/20180718090106/https://www.cs.indiana.edu/chezscheme/match/.
Alex Shinn. 2020. The Chicken Scheme Wiki. "matchable." Retrieved from http://wiki.call-cc.org/eggref/4/matchable.
Juergen Lorenz. 2020. The Chicken Scheme Wiki. "bindings." Retrieved from http://wiki.call-cc.org/eggref/4/bindings.
Marc Feely & Frédéric Hamel. 2020. match-support.scm (source file). Retrieved from https://github.com/gambit/gambit/blob/master/lib/termite/match-support.scm.
Shiro Kawai. Gauche Users Reference. Sect. 12.68 "util.match - Pattern matching." Retrieved from http://practical-scheme.net/gauche/man/gauche-refe/Pattern-matching.html#Pattern-matching.
Dimitris Vyzovitis et. al. Gerbil Reference. Sect. "Core Prelude - Prelude Macros - Pattern Matching." https://cons.io/reference/core-prelude.html#pattern-matching.
Per Bothner. 2020. The Kawa Scheme Language - Reference Documentation. Sect. 8.3 "Variables and Patterns." Retrieved from https://www.gnu.org/software/kawa/Variables-and-Patterns.html.
Marc Nieper-Wißkirchen. 2016. snow-fort.org. "(rapid match)" http://snow-fort.org/s/rapid-scheme.org/marc/rapid/match/0.1.5/index.html.
Yuichi Nishiwaki. 2016 50.destructuring-bind (source directory). Retrieved from https://github.com/picrin-scheme/picrin/tree/master/contrib/50.destructuring-bind.
Matthew Flatt & PLT. The Racket Reference. Chap. 9 "Pattern Matching" Retrieved from https://docs.racket-lang.org/reference/match.html?q=match#%28form._%28%28lib._racket%2Fmatch..rkt%29._match%29%29.
Nils M. Holm. matcher.scm (source code). Retrieved from http://t3x.org/s9fes/matcher.scm.html.
Rafik Naccache unifier.cljc (source code). Retrieved from https://github.com/turbopape/scheje/blob/master/src/scheje/unifier.cljc.
Erick Gallesio. STklos Reference Manual. Chapter 6 "Pattern Matching." Retrieved from https://stklos.net/Doc/html/stklos-ref-6.html.
Peter Norvig. 1992 Paradigms of Artificial intelligence. (1st Ed.). ELIZA,Dialog with a Machine, Chap. 5. Morgan Kaufmann Publishers, San Francisco, CA. Retrieved on July 15 2020 from https://github.com/norvig/paip-lisp/blob/master/docs/chapter5.md.
Alex Shinn provided documentation of his implementation.
Shiro Kawai and Ludovic Cortès et. al. provided pattern grammar documentation.
The members of the SRFI 204 forum helped refine this specification, including John Cowan, Arthur A. Gleckler, Panicz Godek, Shiro Kawai, Lassi Kortela, Adam Nelson, Duy Nguyen, Marc Nieper-Wißkirchen, and Alex Shinn.
Copyright © Felix Thibault (2020).
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.