204: Wright-Cartwright-Shinn Pattern Matcher

by Felix Thibault

Status

This SRFI is currently in draft status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-204@nospamsrfi.schemers.org. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.

Table of Contents

Abstract

Pattern matching decomposes a compound data structure into parts and assigns those parts to variables. This SRFI describes a pattern-matching library already in use by several scheme implementations which can match many common compound data structures.

Issues

Rationale

Scheme has come with its own pattern-matching language since R4RS, as part of syntax-rules [Shinn, Cowan, Gleckler pp. 23-24]. Many implementations add a pattern matcher that does not involve writing macros to make pattern matching available to all users. Before R4RS, there was only the case [Shinn, Cowan, Gleckler pp. 14-15] statement, which does match, but does not bind its key. Bigloo1, STklos2 and s73 all use extended case statements for pattern matching.

Currently, the pattern-matching facilities that have been implemented by SRFIs are procedures of variable arity (SRFI 16) and S-expressions (SRFI 200). The focus of SRFI 200 is both consensus and clarity. That is, it aims to provide a pattern syntax that is both widely shared and easily read without requiring knowledge of any notation that is not ordinary Scheme. SRFI 200 also aims to provide a portable, readable way of using the pattern matchers already available in many implementations.

This SRFI aims to specify a matching syntax beyond S-expressions and provide a sample implementation, documentation, and tests. It is based on a pattern matcher originated by Andrew K. Wright and Robert Cartwright [Wright and Cartwright, pp. 116-120], using procedures and defmacros. Using a technique developed by Oleg Kiselyov [Kiselyov, 2003] Alex Shinn converted the pattern matcher to pure syntax-rules. This pattern-matching library (WCS) is already in use (at least in part) in several Scheme implementations. Aside from S-expressions, it also matches vectors and records. It has the ability to match based on predicates, to match "fields" (procedure applications), getters, and setters, and has syntax to match patterns zero or more, one or more, or exactly n times, or to walk down a tree with a pattern.

WCS is already part of Chibi4, Guile,5 Cyclone6, Chicken7, LispKit8, Loko9, Mosh10, and Sagittarius11. There are other pattern-matching libraries that are part of Armpit Scheme12, Gerbil13, Bigloo14, STklos15, Chicken16, Kawa17, Gambit18, Gauche19, Picrin20, Racket21, Scheme 9 from Empty Space22, and Scheje23. These matchers vary from very similar to the WCS matcher to S-expression matchers, type-oriented matchers, and expert systems more like ELIZA [Norvig, Chapter 5]. They also range from a wrapper around a library in the implementation language, part of the core in the implementation language, part of the core in Scheme, a Scheme library, and an example program. In length, they range from about twenty lines to thousands of lines across several files. Links to documentation and implementation (as available) are in the repository.

Starting with the WCS matcher, a portable syntax can be developed and a broad basis for implementing pattern matchers laid down. Some features of the current matcher are specified as optional and alternate means to achieve the same ends are specified. The current implementation, along with tests, documentation, and implementation-dependent code are in the repository.

Specification

Introduction

Patterns are written to look like the printed representation of the objects they match. The basic usage is (match expr (pat body ...) ...) where the result of expr is matched against each pattern in turn, and the corresponding body is evaluated for the first to succeed. Thus, a list of three elements matches a list of three elements.

(let ((ls (list 1 2 3)))
  (match ls
    ((1 2 3) #t)))
=> #t

If no patterns match, an error is signalled. The forms match, match-lambda, and match-lambda* all can match their expression(s) against multiple patterns, but match-let, match-let*, and match-letrec match each expression against one pattern, so non-matching conditions have to be caught outside the match. The following terms will be used in describing pattern matching:

pattern
an expression that is supposed to match against the first argument to the match form.
body
the set of expressions that is evaluated when a pattern matches.
failure
a thunk that can be called in the body to signal that a match was not successful, and restart matching at the next pattern.
next, subsequent, etc.
patterns are matched in the order they are listed. Matching stops at the first match unless a call is made to failure.
pattern clause
a pattern and its body.
non-linear
matching which involves repetition either through a variable that is used more than once in a pattern.
variable
an identifier in a pattern which may be used in the body. May not be _, quote, $, struct, object, =, and, or, not, ?, set!, get!, quasiquote, ..., ___, **1, =.., *.. unquote, unquote-splicing.
operator
the auxiliary syntax $ struct object = and or not ? set! get! ... ___ **1 =.. *.. which are exported (if not already defined) via SRFI 206, if available.

Patterns

A pattern is an expression made up of pattern operators, literals, and identifiers, which are interpreted as pattern variables. A quasi-pattern is a type of pattern that is quasi-quoted so it is made up of literals, unquoted identifiers, and unquoted operator expressions. Both types of patterns can be used in the sub-forms below, with some caveats for quasi-patterns.

Literal Patterns

Literal patterns are patterns like the example above, where the content of the pattern is equal? to the content of the expression being matched. In this case, quasi-patterns and patterns are very similar (except symbols don't need quoting in quasi-patterns):


(let ((ls (list 'a "b" #f 2 '() #\c #(1))))
  (list (match ls (('a "b" #f 2 () #\c #(1)) 'ok))
        (match ls (`(a "b" #f 2 () #\c #(1)) 'ok))))
=> (ok ok)

The above expression will not work for R6RS, because the vector literal [#(1)] in ls is not quoted. Note also that the empty list does not need to be quoted in either type of pattern. At the time of this writing, cyclic literals seem irregularly implemented, at least in the repl, but this is one area where quasi-patterns and other literal patterns differ, since even those repls which handle cyclic literals [eg '0#=(1 . #0#)] with no trouble have stack overflow, segfaults, etc. with quasi-quoted cyclic literals [eg `0#=(1 . #0#)] due to the difference between quotation and quasi-quotation. So cyclic literals should be avoided inside quasi-patterns.

Pattern Variables

Identifiers in patterns that are not one of the operators (_, quote, $, struct, @, object, =, and, or, not, ?, set!, get!, quasiquote, ..., ___, **1, =.., *.. unquote, unquote-splicing) will match anything and their corresponding binding is available in the body. The special variable _ matches anything and does not bind the result. It can be used to get rid of unintersting values or in a block of match expressions, _ is often used like else if the result is not needed. Pattern variables have to be unquoted in quasi-patterns.


(match (list 1 2 3) ((a b c) b)) => 2
(match (list 1 2 3) ((_ b _) b)) => 2
(match (list 1 2 3) (`(a ,b c) b) (_ 'fail)) => fail
(match (list 1 2 3) (`(1 ,b ,_) b) (_ 'fail)) => 2

Nonlinear Patterns: Repetition

A nonlinear pattern has repeated pattern variables:


(match (list 'A 'B 'A) ((a b a) a) (_ 'fail)) => A
(match (list 'A 'B 'A) (`(,a b ,a) a) (_ 'fail)) => fail
(match (list 'A 'B 'A) (`(,a B ,a) a) (_ 'fail)) => A
(match (list 'A 'B 'A) (`(,a ,b ,a) a) (_ 'fail)) => A

After the first match, subsequent instances must match a value equal? to the first. Any nonlinear pattern clause that involves repetition can be converted into a linear pattern clause that uses failure to restart pattern matching at the next pattern:


(match (list 1 2 1) ((a b a) a) (_ 'fail))

converts to


(match (list 1 2 1) ((a b c) (=> fail)
                        (if (equal? a c) a (fail)))
		       (_ 'fail))

If an implementation chooses not to support repetition, only the second pattern is valid and the first (as well as those in the first part of this section) should cause a syntax error.

Ellipsis and Tail Patterns

The first type of ellipsis pattern is just the ellipsis ..., which is equivalent to the unquote-splicing in quasi-patterns. An ellipsis will match any number (zero or more) of a pattern (like a regex Kleene star):


(match (list 1 2) ((1 2 3 ...) #t)) => #t
(match (list 1 2) (`(1 2 ,@3) #t)) => #t
(match (list 1 2 3) ((1 2 3 ...) #t)) => #t
(match (list 1 2 3) (`(1 2 ,@3) #t)) => #t
(match (list 1 2 3 3 3) ((1 2 3 ...) #t)) => #t
(match (list 1 2 3 3 3) (`(1 2 ,@3) #t)) => #t

(match '((a time) (stitch saves) (in nine)) (((x y) ...) (list x y)))
=> ((a stitch in) (time saves nine))

(match '((a b) (c d) (e f)) (`(,@(x y)) (list x y))) => ((a c e) (b d f))

There can be only one ellipsis or tail pattern per list, but there can be multiple ellipsis or tail patterns in a pattern:


;;;[List-of [List-of Any]] -> [List-of [List-of Any]]
(define transpose
  (match-lambda (((a b ...) ...) (cons a (transpose b))) (_ '())))

(transpose '((1 2 3) (4 5 6))) => ((1 4) (2 5) (3 6))

Ellipsis patterns also do not have to be the last pattern in a list:


(define (palindrome? str)
  (let loop ((chars (filter char-alphabetic?
                            (string->list (string-foldcase str)))))
    (match chars
      (() #t)
      ((a) #t)
      ((a b ... a) (loop b))
      (_ #f))))

(palindrome? "Able was I, ere I saw Elba.") => #t
(palindrome? "Napoleon") => #f

and they can be used with _ to throw away unimportant data:


(define first-column
  (match-lambda (((a _ ...) ...) a)))

(first-column '((1 2 3) (4 5 6) (7 8 9))) => (1 4 7)

___ is provided as an alias for ... when it is inconvenient to use the ellipsis (as in a syntax-rules template).

The second kind of ellipsis is **1, which matches one or more of a pattern (like regex +)


(match (list 1 2) ((a b c **1) c))
ERROR: match: "no matching pattern"

(match (list 1 2 3) ((a b c **1) c)) => (3)

This pattern (and the remaining ellipsis patterns) has no quasi-pattern equivalent, so if it occurs in a quasi-pattern that quasi-pattern must be unquoted:


(define first-column-of-some
  (match-lambda (`(,@(a _ **1)) a)))

(first-column-of-some '((1) (2)))
ERROR: match: "no matching pattern"

(first-column-of-some '((1 2) (3 4))) => (1 3)

The =.. k syntax matches exactly k of a pattern:


(match '((a b) (c d) (e f))
       (((x y) =.. 3) (list x y))
       (_ 'fail))
=> ((a c e) (b d f))

(match '((a b) (c d) (e f) (g h))
       (((x y) =.. 3) (list x y))
       (_ 'fail))
=> fail

The *.. k j syntax matches between k and j of a pattern:


(match '((a b) (c d) (e f))
        (((x y) *.. 2 4) (list x y))
	(_ 'fail))
=> ((a c e) (b d f))

(match '((a b) (c d) (e f) (g h))
       (((x y) *.. 2 4) (list x y))
       (_ 'fail))
=> ((a c e g) (b d f h))

(match '((a b) (c d) (e f) (g h) (i j))
       (((x y) *.. 2 4) (list x y))
       (_ 'fail))
=> fail

Tail patterns are similar to ellipsis patterns in that they can take up several values, but they are different in a few respects:
Tail patterns will match dotted pairs, and ellipsis patterns won't:


(define keys
  (match-lambda (((a _ ...) ...) a) (_ 'fail)))

(keys '((a 1) (b 2) (c 3))) => (a b c)
(keys '((a . 1) (b . 2) (c . 3))) => fail

(define keys
  (match-lambda (((a . _) ...) a) (_ 'fail)))

(keys '((a 1) (b 2) (c 3))) => (a b c)
(keys '((a . 1) (b . 2) (c . 3))) => (a b c)

Tail patterns also don't require any special treatment in quasi-patterns


(define handle-arithmetic-sexpr
 (match-lambda (`(+ . ,operands) (apply + (map eval-sexpr operands)))
               (`(- . ,operands) (apply - (map eval-sexpr operands)))
               (`(* . ,operands) (apply * (map eval-sexpr operands)))
               (`(/ . ,operands) (apply / (map eval-sexpr operands)))))

Tree Patterns

The new operator *** can be used to search a tree for subpatterns. A pattern of the form (x *** y) represents the subpattern y located somewhere in a tree where the path from the current object to y can be seen as a list of the form (x ...). y can immediately match the current object in which case the path is the empty list. In a sense it's a 2-dimensional version of the ... pattern. As a common case the pattern (_ *** y) can be used to search for y anywhere in a tree, regardless of the path used.


(match '(+ (* (+ 7 2) (/ 5 4)) (sqrt (+ (sqr x) (sqr y))))
	  ((a *** 7) a))
=> (+ * +)

(match '(+ (* (+ 7 2) (/ 5 4)) (sqrt (+ (sqr x) (sqr y))))
	  ((_ *** `(sqrt . ,rest)) rest))
=> ((+ (sqr x) (sqr y)))

(define (extract-imports file-name)
  (define extract
    (match-lambda
      (((_ *** `(import . ,imports)) . rest) (cons imports (extract rest)))
      ((this . rest) (extract rest))
      (() '())))
  (call-with-input-file file-name (lambda (port) (extract (read port)))))

Boolean Patterns

The boolean operators and, or and not can be used to group and negate patterns analogously to their Scheme counterparts. The and operator ensures that all subpatterns match. This operator is often used with the idiom (and x pat) to bind x to the entire value that matches pat (c.f. "as-patterns" in ML or Haskell). Another common use is in conjunction with not patterns to match a general case with certain exceptions.

(match 1 ((and) #t)) => #t
(match 1 ((and x) x)) => 1
(match 1 ((and x 1) x)) => 1

An and pattern is different from its Scheme counter part in that matching, not truth, is what determines whether it succeeds:


(match #f ((and) #t) (_ #f)) => #t

Failure is one way to catch when a value has matched false


(match #f ((and x) (=> fail) (if x #t (fail))) (_ #f)) => #f

The or operator ensures that at least one subpattern matches. If the same identifier occurs in different subpatterns, it is matched independently. All identifiers from all subpatterns are bound if the or operator matches, but the binding is only defined for identifiers from the subpattern which matched.

(match 1 ((or) #t) (else #f)) => #f
(match 1 ((or x) x)) => 1
(match 1 ((or x 2) x)) => 1

If failure patterns are used as replacements for or repetition patterns, the or should be moved from the pattern to the body:


(define last-matches-one-of-first-three
  (match-lambda ((a a) #t)
                ((a b c ... (or a b)) #t)
                ((a b c d ... c) #t)
                (_ #f)))

(define last-matches-one-of-first-three
  (match-lambda ((a a) #t)
                ((a b c ... d)
                 (=> fail)
                 (if (or (equal? d a) (equal? d b))
                     #t
                     (fail)))
                ((a b c d ... e) (equal? c e))
                (_ #f)))

Since all variables in an or pattern are bound but only one is defined, using or in combination with ellipsis patterns (as below) can lead to results with large numbers of undefined values:


(match (get-environment-variables)
  (((or ("PATH" . path)
        ("HOMEPROFILE" . home)
	("HOME" . home)
	("USER" . user)
	("USERNAME" . user)
	(_ . _)) ...)
   (list path home user)))
=> [three large lists of mostly undefined values]
One way to get the intended results is shown below:


(fold (match-lambda*
	((("PATH" . path) (p h u)) (list path h u))
	((("USERPROFILE" . home)  (p h u)) (list p home u))
	((("HOME" . home)  (p h u)) (list p home u))
	((("USER" . user)  (p h u))  (list p h user))
	((("USERNAME" . user)  (p h u))  (list p h user))
	(_ out) out)
      (list #f #f #f) ; (p h u) init
      (get-environment-variables))
=> [List of 3 strings]

In other cases, it may make sense to use ((or pats) ...) patterns with clean-up:


(define (clean lst)
  (let ((undef (when #f #f)))
    (remove (lambda (item) (equal? item undef)) lst)))

(match (iota 7)
   (((or 2 6 rest) ...) (clean rest)))
=> (0 1 3 4 5)

The not operator succeeds if the given pattern doesn't match. None of the identifiers used are available in the body. This pattern also gives us a way to eliminate values (like false above) in a more streamlined way:


(match 1 ((and x (not #f)) x) (_ 'fail)) => 1

(match #f ((and x (not #f)) x) (_ 'fail)) => fail

(match 1 ((not 2) #t)) => #t

Predicates and Fields

The operator ? can be used to provide a predicate. The usage is (? predicate pat ...) where predicate is a Scheme expression evaluating to a predicate called on the value to match, and any optional patterns after the predicate are then matched as in an and pattern.

(match 1 ((? odd? x) x)) => 1

Along with boolean patterns, the predicate pattern is enough to make the other half of the math expression evaluator started in the section on dotted pairs:


(define eval-sexpr
  (match-lambda ((? number? n) n)
                ((and pair ((or '+ '- '* '/) . rest))
		 (handle-arithmetic-sexpr pair))
		(_ (error "not implemented yet"))))

(eval-sexpr '(+ (* 3 4 5) (- 10 3))) => 67

A predicate may be called multiple times on matching and non-matching expressions and sub-expressions, in any order, so side effects should be avoided unless there is some overriding interest (debugging, logging, short-cutting, etc.). Predicates can also be written using references to pattern variables, if repetition is supported:


(define fibby?
  (match-lambda ((a b (? (lambda (x) (= (+ a b) x)) c) . rest)
		  (fibby? (cons b (cons c rest))))
                 ((a b) #t)
                 ((a) #t)
                 (() #t)
		 (_ #f)))

;;; doesn't refer to pattern variables in predicate
(define fibby?
  (match-lambda ((a b c . rest)
		  (if (= (+ a b) c)
		      (fibby? (cons b (cons c rest)))
		      #f))
                 ((a b) #t)
                 ((a) #t)
                 (() #t)
		 (_ #f)))

(fibby? '(4 7 11 18 29 47)) => #t

The field operator = is used to extract an arbitrary field and match against it. It is useful for more complex or conditional destructuring that can't be more directly expressed in the pattern syntax. The usage is (= field pat), where field can be any expression, and should result in a procedure of one argument, which is applied to the value to match to generate a new value to match against pat. Thus the pattern (and (= car x) (= cdr y)) is equivalent to (x . y) except that it will result in an immediate error if the value isn't a pair. Like predicates, fields should avoid side effects when there is no overriding interest. In contrast to the predicate operator, the field operator succeeds when its value is false. Compare:



(match 1 ((and n (? even?)) n) (_ 'fail)) => fail

(match 1 ((and n (= even? r)) (list n r)) (_ 'fail)) => (1 #f)

(match '(a b c d) ((or (= (lambda (x) (memq 'f x)) r)
                       (= (lambda (x) (memq 'g x)) r)
                       (= (lambda (x) (memq 'b x)) r))
		   r)
		   (_ 'fail))
=> #f

(match '(a b c d) ((or (= (lambda (x) (memq 'f x)) (and r (not #f)))
                       (= (lambda (x) (memq 'g x)) (and r (not #f)))
                       (= (lambda (x) (memq 'b x)) (and r (not #f))))
		   r)
		   (_ 'fail))
=> (b c d)
(match '(1 . 2) ((= car x) x)) => 1
(match 4 ((= square x) x)) => 16

Getters and Setters

The set! and get! operators are used to bind an identifier to the setter and getter of a field, respectively. The setter is a procedure of one argument, which mutates the field to that argument. The getter is a procedure of no arguments which returns the current value of the field.

(let ((x (cons 1 2)))
  (match x
    ((1 . (set! s)) (s 3) x)))
=> (1 . 3)
(match '(1 . 2) ((1 . (get! g)) (g))) => 2

(define alist (list (cons 'a 1) (cons 'b 2) (cons 'c 3)))

(define get-c (match alist ((= (lambda (al) (assv 'c al))
                            (_ . (get! g)))
			 g)))
(get-c) => 3

(define set-c! (match alist ((= (lambda (al) (assv 'c al))
			    (_ . (set! s)))
			 s)))
(set-c! 7)

(get-c) => 7

alist => ((a . 1) (b . 2) (c . 7))

Record Patterns

Note: record introspection is not currently portable, so record operators are an interface whose implementation will not be portable. This is an optional part of the SRFI. See the implementation section for more information.

The record operator $ is used as a concise way to match records defined by SRFI 9 (or SRFI 99). The usage is ($ rtd field ...), where rtd should be the record type descriptor specified as the first argument to define-record-type, and each field is a subpattern matched against the fields of the record in order. Not all fields need be present. The operator struct can be used as a synonym for $.

(let ()
  (define-record-type employee
    (make-employee name title)
    employee?
    (name get-name)
    (title get-title))
  (match (make-employee "Bob" "Doctor")
    (($ employee n t) (list t n))))
=> ("Doctor" "Bob")

For records with more fields it can be helpful to match them by name rather than position. For this you can use the object operator, originally a Gauche extension:

(let ()
  (define-record-type employee
    (make-employee name title)
    employee?
    (name get-name)
    (title get-title))
  (match (make-employee "Bob" "Doctor")
    ((object employee (title t) (name n)) (list t n))))
=> ("Doctor" "Bob")

Matching on records when this feature is not implemented can be done as follows:


(match (make-employee "Bob" "Doctor")
       ((and (? employee?)
             (= get-title t)
	     (= get-name n))
	(list t n)))
=> ("Doctor" "Bob")

Emulating setter patterns like:


(let () (define-record-type <posn>
           (make-posn x y)
	   posn?
	   (x posn-x set-posn-x!)
           (y posn-y set-posn-y!))
	 (match (make-posn 3 4)
		((and p ($ <posn> (set! set-x)))
		(set-x 7)
		(match p (($ <posn> x y) (list x y))))))
=> (7 4)
is described in the next section.

Extension

For data types or matching not covered by equal?, SRFI 26 in combination with user-defined procedures, predicates, and fields can be useful.

For example, SRFI 111 (boxes) is opaque to record pattern matching, but can be handled as below:


(define (box-equal? a b)
  (if (and (box? a) (box? b))
      (box-equal? (unbox a) (unbox b))
      (equal? a b)))

;r6rs
(match (list (box 1) (box 1))
       ((a a) 'ok) (_ 'fail))
=> fail

(match (list (box 1) (box 1))
       ((a (? (cut box-equal? a <>))) 'ok)
       (_ 'fail))
=> ok

;case for no nonlinear predicates
(match (list (box 1) (box 1))
       ((a b) (if (box-equal? a b) 'ok 'fail)))
=> ok

(match (box 1) ((= unbox value) value))
=> 1

(define-values (get-value set-value!)
  (match (box 1)
         ((and (= (lambda (box) (cut unbox box)) get)
	       (= (lambda (box) (cut set-box! box <>)) set))
	  (values get set))))

(get-value) => 1
(set-value! 18)
(get-value) => 18

Pattern Grammar

pat : patvar                       ;; anything, and binds pattern var
    | _                            ;; anything
    | ()                           ;; the empty list
    | #t                           ;; #t
    | #f                           ;; #f
    | string                       ;; a string
    | number                       ;; a number
    | character                    ;; a character
    | 'sexp                        ;; an s-expression
    | 'symbol                      ;; a symbol (special case of s-expr)
    | (pat1 ... patN)              ;; list of n elements
    | (pat1 ... patN . patN+1)     ;; list of n or more
    | (pat1 ... patN patN+1 ooo)   ;; list of n or more, each element
                                   ;;   of remainder must match patN+1
    | (pat1 ... patN patN+1 ooo patN+2 ... patM)
                                   ;; list of m or more, every element between the
				   ;; nth and the (m or more - n)th must match
				   ;; patN+1.
    | #(pat1 ... patN)             ;; vector of n elements
    | #(pat1 ... patN patN+1 ooo)  ;; vector of n or more, each element
                                   ;;   of remainder must match patN+1
    | #(pat1 ... patN patN+1 ooo patN+2 ... patM)
                                   ;; vector of m or more, every element between the
				   ;; nth and the (m or more - n)th must match
				   ;; patN+1.
    | ($ record-type pat1 ... patN)      ;; a record (patK matches in slot
                                         ;;    order) optional
    | (struct struct-type pat1 ... patN) ;; ditto (*) optional
    | (object struct-type (slot1 pat1) ...) ;; a record (using slot names) (*)
                                         ;;    optional
    | (= proc pat)                 ;; apply proc, match the result to pat
    | (and pat ...)                ;; if all of pats match
    | (or pat ...)                 ;; if any of pats match
    | (not pat ...)                ;; if no pats match
    | (? predicate pat ...)        ;; if predicate true and all pats match
    | (set! patvar)                ;; anything, and binds setter
    | (get! patvar)                ;; anything, and binds getter
    | (pat1 *** pat2)              ;; tree subpattern (*)
    | `qp                          ;; a quasi-pattern

    (*) extended syntax not originally part of
        Wright-Cartwright pattern matcher.

patvar : an identifier except _, quote, $, struct, @, object, =, and, or,
         not, ?, set!, get!, quasiquote, ..., ___, **1, =.., *..
	 unquote, unquote-splicing.

ooo : ...                          ;; zero or more
    | ___                          ;; zero or more
    | **1                          ;; one or more
    | =.. k                        ;; exactly k where k is an integer. (*)
                                   ;;   Example: =.. 1, =.. 2 ...
    | *.. k j                      ;; between k and j, where k and j are (*)
                                   ;;   integers. Example: *.. 3 4, match 3
                                   ;;   or 4 of a pattern *.. 1 5 match from
				   ;;   1 to 5 of a pattern

    (*) extended syntax not originally part of
        Wright-Cartwright pattern matcher.
    Note:  a list can contain only one ellipsis pattern which includes the
             patterns above, tail patterns, and unquote-splicing patterns.

qp  : ()                           ;; the empty list
    | #t                           ;; #t
    | #f                           ;; #f
    | string                       ;; a string
    | number                       ;; a number
    | character                    ;; a character
    | identifier                   ;; a symbol
    | (qp_1 ... qp_n)              ;; list of n elements
    | (qp_1 ... qp_n . qp_{n+1})   ;; list of n or more
    | (qp_1 ... qp_n qp_n+1 ooo)   ;; list of n or more, each element
                                   ;;   of remainder must match qp_n+1
    | #(qp_1 ... qp_n)             ;; vector of n elements
    | #(qp_1 ... qp_n qp_n+1 ooo)  ;; vector of n or more, each element
                                   ;;   of remainder must match qp_n+1
    | ,pat                         ;; a pattern
    | ,@pat                        ;; a pattern

Syntax

(match expr (pattern . body) ...)
(match expr (pattern (=> failure) . body) ...)

The result of expr is matched against each pattern in turn, according to the pattern rules described in the previous section, until the the first pattern matches. When a match is found, the corresponding bodys are evaluated in order, and the result of the last expression is returned as the result of the entire match. If a failure is provided, then it is bound to a procedure of no arguments which continues processing at the next pattern. If no pattern matches, an error is signalled. If SRFI 201 is used, this is the only procedure needed.


;;; a simple match pattern:

(match '(1 1 1) ((a =.. 3) 'ok) (_ 'fail)) => ok

;;; the same pattern with failure

(match '(1 1 1)
        ((a =.. 3) (=> fail)
	 (if (= (car a) 1) (fail) 'ok))
	(_ 'fail))
=> fail

;;; a passing pattern

(match '(2 1 1)
        ((a =.. 3) (=> fail)
	 (if (= (car a) 1) (fail) 'ok))
	(_ 'fail))
=> ok

(match-lambda (pattern . body) ...)

Shortcut for lambda + match. Creates a procedure of one argument, and matches that argument against each clause.


(define-record-type <checkable>
  (make-checkable pred value)
  checkable?
  (pred checkable-pred)
  (value checkable-value))

(define check
  (match-lambda (($ <checkable> pred (? pred ok)) ok)
                (($ <checkable>) 'bad-data)))

(check (make-checkable odd? 1)) => 1
(check (make-checkable odd? 2)) => bad-data

(match-lambda* (pattern . body) ...)

Similar to match-lambda.

Creates a procedure of any number of arguments, and matches the argument list against each clause.


(define zero-to-three-cycle
  (match-lambda ((and c (= car c)) 0)
		((and c (= cdr c)) 1)
		((and c (= cddr c)) 2)
		((and c (= cdddr c)) 3)
		(_ 'fail)))

(define l3 (list 1 2 3))
(set-cdr! (cddr l3) l3)
(zero-to-three-cycle l3) => 3

(define multiples-of-seven?
  (match-lambda* (((? (lambda (x) (zero? (modulo x 7)))) . rest)
                  (apply multiples-of-seven? rest))
		 (() #t)
		 (_ #f)))

(multiples-of-seven? 7 14 49 28 56 77) => #t

(match-let ((pattern value) ...) . body)
(match-let loop ((pattern init) ...) . body)

Matches each pattern to the corresponding expression, and evaluates the body with all match variables in scope. Raises an error if any of the expressions fail to match. Syntax analogous to named let can also be used for recursive procedures which match on their arguments as in match-lambda*.


(define (fact n)
  (if (zero? n)
      1
      (match-let loop (((a . rest) (cdr (iota (+ n 1))))
                       (out 1))
		      (if (null? rest) (* a out) (loop rest (* a out))))))

(fact 11) => 39916800
(fact 0) => 1

Note: as this example shows, since each binding takes one pattern, if an init value may not match its pattern (as in (cdr (iota (+ 0 1))) => ()) that has to be handled some other way.

(match-letrec ((pattern value) ...) . body)

Similar to match-let, but analogously to letrec matches and binds the patterns with all match patterns in scope.

(match-let* ((pattern value) ...) body ...)

Similar to match-let, but analogously to let* matches and binds the patterns in sequence, with preceding match patterns in scope.

Tail Contexts

If a pattern matcher is in tail context, expressions marked as tail-bodies below will be in tail context.


(match <expression> <match-clause>+)

(match-lambda <match-clause>+)

(match-lambda* <match-clause>+)

(match-let ((<pattern> <expression>)) <tail-body>+)

(match-letrec ((<pattern> <expression>)) <tail-body>+)

(match-let <variable> ((<pattern> <expression>)) <tail-body>+)

(match-let* ((<pattern> <expression>)) <tail-body>+)

where:

<match-clause> → (<pattern> <tail-body>+)

Side Effects

There are two types of patterns where the issue of side-effects is prominent:

Errors

The following is a list of examples of the erroneous conditions, and the message given by the current implementation:

A successful match with an empty body [ like (match (list 1 2 3) ((a b c))) ] does not have a reliable return value in the sample implementation. This form should be avoided and should become a syntax error.

Using in Other Macros

Other macros that use the pattern matcher will be subject to the same constraints, in particular with respect to names. For example:


(define-syntax make-chunker
  (syntax-rules ()
    ((_ s ...)
     (lambda (l)
       (let lp ((l l))
	 (match l (() '())
		  ((s ... . rest) (cons (list s ...) (lp rest)))
		  (end (list end))))))))

((make-chunker a b c d) (iota 20))
=> ((0 1 2 3) (4 5 6 7) (8 9 10 11) (12 13 14 15) (16 17 18 19))

((make-chunker a b c _) (iota 20))
=> ERROR on line 16: invalid use of syntax as value: _

((make-chunker a b c ___) (iota 20))
=> ERROR on line 715 of file ./srfi-204/srfi-204.scm:
     no expansion for:
       (match-syntax-error "dotted tail not allowed after ellipsis" rest)

Implementation

A Sample implementation is available on Github and in this .tgz file.

Any implementation of SRFI 204 should use SRFI 206 to export whatever of the auxiliary syntax needs to be defined (from $, ?, ***, ___, **1, =.., *.., get!, struct, object ) or the included auxiliary-syntax.scm file, if SRFI 206 is not available. SRFI 206 is used as follows:


(import (only (srfi 206 all) $ ? *** ___ **1 =.. etc.))
and auxiliary-syntax.scm is used as follows:


;; from Akku
(import (srfi private include))
(begin
  (include/resolve ("srfi" "204") "auxiliary-syntax.scm")
  (define-auxiliary-keywords $ ? *** ___ **1 =.. etc.)
  ...)

;; in Guile
(begin
  (include-from-path "srfi/204/auxiliary-syntax.scm")
  (define-auxiliary-keywords $ ? *** ___ **1 =.. etc.)
  ...)

To use the $, struct, and object syntax, an implementation needs to provide the forms is-a?, slot-ref, and slot-set!. These forms have the following syntax:


(is-a? rec rtd) => #t if type of rec matches rtd, #f otherwise

(slot-ref rtd rec slot) => If slot is an integer, value at position slot,
                    if slot is a symbol, value named slot

(slot-set! rtd rec slot new-value) If slot is an integer, set! the value at
                            position slot to new-value. If slot is a symbol,
                            set! the value named slot to new-value

Since these forms rely on the underlying structure of records in a particular implementation, their definition will vary from one Scheme to another. This is an example from Gauche, where records are implemented on top of native CLOS-like objects:


(import (only (gauche base) is-a? slot-definition-name class-slots)
  (scheme base)
  (rename (gauche base)
	  (slot-ref gb-slot-ref)
	  (slot-set! gb-slot-set!)))
  (begin
    (define-syntax slot-ref
      (syntax-rules ()
        ((_ class inst n)
         (if (integer? n)
           (gb-slot-ref inst
                        (list-ref (map slot-definition-name
			               (class-slots class))
				  n))
           (gb-slot-ref inst n)))))
    (define-syntax slot-set!
      (syntax-rules ()
        ((_ class inst n value)
         (if (integer? n)
             (gb-slot-set! inst
    	              (list-ref (map slot-definition-name
		                     (class-slots class))
				n)
    	              value)
             (gb-slot-set! inst n value))))))

this is an example from Larceny, which has SRFI 99:


(import (scheme base)
	(srfi 99 records))
(begin
  (define-syntax is-a?
    (syntax-rules ()
      ((_ rec rtd)
       ((rtd-predicate rtd) rec))))
  (define-syntax slot-ref
    (syntax-rules ()
      ((_ rtd rec n)
       (if (integer? n)
           ((rtd-accessor rtd (vector-ref (rtd-all-field-names rtd) n)) rec)
           ((rtd-accessor rtd n) rec)))))
  (define-syntax slot-set!
    (syntax-rules ()
      ((_ rtd rec n value)
       (if (integer? n)
           ((rtd-mutator rtd (vector-ref (rtd-all-field-names rtd) n)) rec value)
           ((rtd-mutator rtd n) rec value))))))

and this is an example from Loko and Chez, which use forms from (rnrs records procedural) and (rnrs records inspection):


(define-syntax is-a?
  (syntax-rules ()
    ((_ rec rtd)
     ((record-predicate (record-type-descriptor rtd)) rec))))
(define-syntax slot-ref
  (syntax-rules ()
    ((_ rtd rec n)
     (let ((rtd (record-type-descriptor rtd)))
       (if (integer? n)
	   ((record-accessor rtd n) rec)
	   ((record-accessor rtd (name->idx rtd n)) rec))))))
(define-syntax slot-set!
  (syntax-rules ()
    ((_ rtd rec n value)
     (let ((rtd (record-type-descriptor rtd)))
       (if (integer? n)
	   ((record-mutator rtd n) rec value)
	   ((record-mutator rtd (name->idx rtd n)) rec value))))))
(define-syntax name->idx
  (syntax-rules ()
	((_ rtd n)
	 (let* ((names (record-type-field-names rtd))
		(len (vector-length names)))
	   (let lp ((i 0))
	     (cond
	       ((> i len) (error "name not in record" n))
	       ((eq? n (vector-ref names i)) i)
	       (else (lp (+ i 1 )))))))))
match-letrec uses the technique described in [Kiselyov, 2013]. This form gets used the least by users, so it may be a useful exercise for implementors to look for internal defines and other items that can be translated into match-letrec to build up a testing library.

References

Alex Shinn, John Cowan, Arthur A. Gleckler, et al. 2013 Revised7 Report on the Algorithmic Language Scheme. Retrieved from https://small.r7rs.org/attachment/r7rs.pdf.

Bill Schottstaedt. 2020 s7 case.scm. Retrieved from https://ccrma.stanford.edu/software/snd/snd/s7.html#case.

Andrew K. Wright & Robert Cartwright. ACM Transactions on Programming Languages and Systems. Vol. 19, No. 1. January 1997. Pages 87-152. Retrieved on July 15, 2020 from https://www.iro.umontreal.ca/~feeley/cours/ift6232/doc/pres2/practical-soft-type-system-for-scheme.pdf.

Kiselyov, Oleg 2003 "How to write symbol? with syntax rules" comp.lang.scheme Retrieved from http://okmij.org/ftp/Scheme/macro-symbol-p.txt

Kiselyov, Oleg 2013 "How to Write Seemingly Unhygenic and Referentially Opaque Macros with Syntax-rules*" Retrieved from http://okmij.org/ftp/Scheme/Dirty-Macros.pdf

Alex Shinn. 2020. (chibi match). Retrieved from http://synthcode.com/scheme/chibi/lib/chibi/match.html.

Justin Ethier. 2019. Match Library. Retrieved from http://justinethier.github.io/cyclone/docs/api/cyclone/match.

Ludovic Cortès, Bake Timmons, Arun Isaac, & Paul Morris. 2019. Guile Reference Manual.. Sect. 7.8 "Pattern Matching." Retrieved from https://www.gnu.org/software/guile/manual/html_node/Pattern-Matching.html.

Matthias Zenger. 2019. match.scm (source file). Retrieved from https://github.com/objecthub/swift-lispkit/blob/master/Sources/LispKit/Resources/Libraries/lispkit/match.sld.

Göran Weinholt. 2019. match.sls (source file). Retrived from https://gitlab.com/weinholt/loko/-/blob/master/lib/match.sls.

Higepon Taro Minowa, et. al. 2010 Pattern Match. Retrieved from http://mosh.monaos.org/files/lib/match-ss.html#Pattern_Match.

Takashi Kato. 2019. Sagittarius Users' Reference. Sect. 8.1 "(match) -- Pattern Matching" Retrieved from http://ktakashi.github.io/sagittarius-online-ref/section81.html.

Hubert Montas. 2018. A Scheme Interpreter for ARM Microcontrollers. Sect. "Program Examples for Version 080 - Expert System" Retrieved from http://armpit.sourceforge.net/v080/aps_080_common_examples.html#Expert.

Manuel Serrano & Jean-Marie Geffroy. 2013. Bigloo/manual. Chapter 6. "Pattern Matching" Retrieved from http://www-sop.inria.fr/mimosa/fp/Bigloo/manual-chapter6.html.

R. Kent Dybig. 2010. "Using Match." Retrieved from Indiana University Bloomington via archive.org: https://web.archive.org/web/20180718090106/https://www.cs.indiana.edu/chezscheme/match/.

Alex Shinn. 2020. The Chicken Scheme Wiki. "matchable." Retrieved from http://wiki.call-cc.org/eggref/4/matchable.

Juergen Lorenz. 2020. The Chicken Scheme Wiki. "bindings." Retrieved from http://wiki.call-cc.org/eggref/4/bindings.

Marc Feely & Frédéric Hamel. 2020. match-support.scm (source file). Retrieved from https://github.com/gambit/gambit/blob/master/lib/termite/match-support.scm.

Shiro Kawai. Gauche Users Reference. Sect. 12.68 "util.match - Pattern matching." Retrieved from http://practical-scheme.net/gauche/man/gauche-refe/Pattern-matching.html#Pattern-matching.

Dimitris Vyzovitis et. al. Gerbil Reference. Sect. "Core Prelude - Prelude Macros - Pattern Matching." https://cons.io/reference/core-prelude.html#pattern-matching.

Per Bothner. 2020. The Kawa Scheme Language - Reference Documentation. Sect. 8.3 "Variables and Patterns." Retrieved from https://www.gnu.org/software/kawa/Variables-and-Patterns.html.

Marc Nieper-Wißkirchen. 2016. snow-fort.org. "(rapid match)" http://snow-fort.org/s/rapid-scheme.org/marc/rapid/match/0.1.5/index.html.

Yuichi Nishiwaki. 2016 50.destructuring-bind (source directory). Retrieved from https://github.com/picrin-scheme/picrin/tree/master/contrib/50.destructuring-bind.

Matthew Flatt & PLT. The Racket Reference. Chap. 9 "Pattern Matching" Retrieved from https://docs.racket-lang.org/reference/match.html?q=match#%28form._%28%28lib._racket%2Fmatch..rkt%29._match%29%29.

Nils M. Holm. matcher.scm (source code). Retrieved from http://t3x.org/s9fes/matcher.scm.html.

Rafik Naccache unifier.cljc (source code). Retrieved from https://github.com/turbopape/scheje/blob/master/src/scheje/unifier.cljc.

Erick Gallesio. STklos Reference Manual. Chapter 6 "Pattern Matching." Retrieved from https://stklos.net/Doc/html/stklos-ref-6.html.

Peter Norvig. 1992 Paradigms of Artificial intelligence. (1st Ed.). ELIZA,Dialog with a Machine, Chap. 5. Morgan Kaufmann Publishers, San Francisco, CA. Retrieved on July 15 2020 from https://github.com/norvig/paip-lisp/blob/master/docs/chapter5.md.

Acknowledgements

Andrew K. Wright, Robert Cartwright, Alex Shinn, and Panicz Maciej Godek.

Alex Shinn provided documentation of his implementation.

Shiro Kawai and Ludovic Cortès et. al. provided pattern grammar documentation.

The members of the SRFI 204 forum helped refine this specification, including John Cowan, Arthur A. Gleckler, Panicz Godek, Shiro Kawai, Lassi Kortela, Adam Nelson, Duy Nguyen, Marc Nieper-Wißkirchen, and Alex Shinn.

Copyright © Felix Thibault (2020).

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


Editor: Arthur A. Gleckler