200: Pattern Matching

by Panicz Maciej Godek

Status

This SRFI is currently in draft status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-200@nospamsrfi.schemers.org. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.

Abstract

This SRFI discusses some of the existing pattern-matching libraries for the Scheme programming language — namely, the pattern matcher presented by Andrew K. Wright and Robert Cartwright in the paper "A Soft Type System for Scheme", the pattern matcher developed by Dan Friedman, Erik Hilsdale and Kent Dybvig, the racket/match module distributed with the Racket programming environment, as well as the Bigloo and Gerbil pattern matchers distributed with their respective implementations. It then extracts a pattern syntax which is compatible with three of those implementations and provides extrinsic rationale for that syntax. It also provides a simple implementation of a pattern matcher which conforms to the specification of a pattern language provided in this document.

Issues

None at present.

Rationale

Pattern matching is a technique for (conditionally) naming components of compound data structures. It is commonly known in functional programming circles, and it has been adapted to the Scheme programming language in the early 1990s.

It also partially overlaps in functionality with the technique called destructuring bind, known from Common Lisp.

For example, instead of writing

  (cond ((and (list? x) (= (length x) 3) (eq? (car x) '+))
         (+ (cadr x) (caddr x)))
        (else
         'operation-not-supported))
it is often more clear and concise to write
  (match x
    (`(+ ,x ,y) (+ x y))
    (_ 'operation-not-supported))

Today, there exist a few different implementations that offer many extensions to the basic pattern-matching facility, and that are mutually incompatible.

The purpose to this SRFI is to codify the basic common features that are present in all or most of those libraries in order to facilitate the development of portable programs that utilize pattern matching.

This document isn't meant to serve as comprehensive documentation for any of the discussed libraries, as they are already well documented in their appropriate sources.

It does, however, provide a minimalist reference implementation expressed in terms of simple syntax-case macros, and an equivalent implementation expressed in terms of syntax-rules macros.

Specification

The main interface to using syntactic pattern matching is the same for all libraries discussed in this document — namely, the match macro, which is used in the following way:

  (match expression
    (<pattern-1> <actions-1*> ... <value-1>)
     ...
    (<pattern-N> <actions-N*> ... <value-N>))

where <actions-X*> are optional sequences of expressions.

The expression is evaluated once, and its value is tested against the subsequent <pattern-X> patterns. When the first matching pattern <pattern-K> is encountered, the sequence of <actions-K*> is executed, and the value of the expression <value-K> becomes the value of the whole match expression.

The pattern may contain some variables (which are typically expressed using unquoted symbols). If it does, then those variables become the names of some parts of the data being pattern-matched, and the named parts can be referred to from the <actions-K*> and <value-K> expressions.

If the data does not match any of the patterns, the result of the expression depends on the pattern matcher under consideration: it may be either an unspecified value, some specified value or an error. However, since writing non-exhaustive patterns in match expressions is generally considered a bad practice, this aspect of the match form will not be further discussed in this document.

It is mainly the form, capabilities and the interpretation of the <pattern-X> expressions where the libraries differ.

The Wright-Cartwright-Shinn pattern matcher

Probably the most common pattern-matching library distributed with some popular Scheme implementations is the match.scm module developed by Alex Shinn in portable syntax-rules macros. It is based on a specification developed by Andrew K. Wright and Robert Cartwright in their paper A Practical Soft Type System for Scheme.

It is distributed along with many Scheme implementations, including Chibi (the (chibi match) module), Guile (the (ice-9 match) module), Cyclone (the (cyclone match) module), LispKit (the match.sld file), Loko (the match.sls file), Mosh (the Pattern Match library and Sagittarius (the (match) module). It is also available in the Chicken Scheme's repository as the matchable module.

Simple patterns

The <pattern-X> expressions are interpreted in the following way: if a pattern is a literal, then it matches the data that is equal to that literal. Otherwise, if it is a regular unquoted symbol (henceforth called a pattern variable), then the match succeeds, and the symbol gets bound to the value being pattern-matched. If the pattern is the special symbol _ (underscore), then the match succeeds, but the symbol is not bound.

Compound patterns

Things become more interesting when the pattern is a pair, i.e. when it has the form (head . tail). Roughly speaking, data matches such patterns if it is also a pair, the head of the data matches the head of the pattern, and the tail of the data matches the tail of the pattern. So for example, the list (1 2) matches the pattern (a b) (with a being bound to 1 and b being bound to 2), but it also matches the pattern (a . b) (where a is bound to 1 and b is bound to (2)).

The same symbol can be used in a pattern more than once. When that's the case, the pattern matches only if all the occurrences of that symbol would be bound to values that are equal?: for example, the pattern (a . a) matches the value (1 . 1) or ((1) . (1)), but it does not match the value (1 . 2) or the value (1 1).

This allows one to define a function that removes adjacent equal? elements from a list in the following way:

  (define (unique elements)
    (match elements
      ((x x . rest)
       (unique `(,x . ,rest)))
      ((x . rest)
       `(,x . ,(unique rest)))
      (()
       '())))

For example, (unique '(a a a b b b b a a c c)) evaluates to (a b a c).

An equivalent definition that does not use pattern matching would look like this:

  (define (unique elements)
    (cond ((null? elements)
           '())
          ((and (pair? elements)
                (pair? (cdr elements))
                (equal? (car elements)
                        (cadr elements)))
           (unique (cdr elements)))
          (else
           (cons (car elements)
                 (unique (cdr elements))))))

Compound literals and quasi-literals

The previous section claimed that a value matches the (head . tail) pattern only if it is a pair and the head and the tail of the data matches the head and the tail of the pattern.

This is true in most circumstances. However, there are a few exceptions. Notably, if the pattern is a pair and its head is the quote symbol, then the data matches the pattern only if it is equal? to the tail of the pattern. For example, the symbol 'x matches the pattern 'x and the value of the expression (list 1 2 3) matches the pattern (quote (1 2 3)), or equivalently '(1 2 3).

More interestingly, if the head of the pattern is the symbol quasiquote, then the pattern behaves similarly to quasi-quotation, attributing special meaning to the unquote symbol. In particular, this allows one to express the above definition of the function unique as:

  (define (unique elements)
    (match elements
      (`(,x ,x . ,rest)
       (unique `(,x . ,rest)))
      (`(,x . ,rest)
       `(,x . ,(unique rest)))
      ('()
       '())))

This feature exposes symmetry between pattern matching and quasi-quotation. However, the unquote-splicing operator is not implemented in the Wright-Cartwright-Shinn matcher.

Predicates

The Wright-Cartwright-Shinn pattern matcher also offers a special syntax for testing whether a given sub-pattern satisfies a certain predicate: if a pattern is a list whose first element is the ? (question mark) symbol, then the second element of that list must be a unary predicate. The pattern matches if the predicate is satisfied.

Optionally, there may be a third element in the list. It is an additional pattern that needs to be satisfied by the datum. It is typically used for naming the element that satisfies the predicate.

Ellipsis

Even though the Wright-Cartwright-Shinn pattern matcher does not support the unquote-splicing operator, it provides the ... (ellipsis) operator which behaves similarly to the one that can be found in the syntax-rules pattern language.

The ... operator must appear in a list, following a pattern, but no more than one ellipsis is allowed to appear at a single level of nesting. The behavior of the operator is best illustrated with an example - the palindrome? predicate, which evaluates to #true iff its argument is a list equal? to its own reverse:

  (define (palindrome? list)
    (match list
      (()          #true)
      ((x)         #true)
      ((a b ... a) (palindrome? b))
      (_           #false)))

If list is (1 2 3 2 1), then the first invocation of palindrome? will match the (a b ... a) pattern, binding a to 1 and b to the list (2 3 2).

The ellipsis operator can also be used with more complex patterns. For example, the expression

  (match '((a 1) (b 2) (c 3))
    (((s n) ...) `(,s ,n)))

evaluates to ((a b c)(1 2 3)).

The ellipsis operator can be nested, so the pattern ((x ...) ...) is also valid, even if difficult to substantiate.

Boolean combinations

The Wright-Cartwright-Shinn matcher also allows one to combine patterns using Boolean operators: (and pat-1 ... pat-n) matches if a given object matches all of the pat-1 ... pat-n patterns, (or pat-1 ... pat-n) matches if a given object matches any of the pat-1 ... pat-n patterns, and (not pat-1 ... pat-n) matches if an object does not match any of the pat-1 ... pat-n patterns.

The complete grammar for the Wright-Cartwright-Shinn matcher

The specification of the Wright-Cartwright-Shinn pattern matcher provides the following grammar:

        patterns:                       matches:

  pat ::= identifier                      anything, and binds identifier
        | _                               anything
        | ()                              the empty list
        | #t                              #t
        | #f                              #f
        | string                          a string
        | number                          a number
        | character                       a character
        | 'sexp                           an s-expression
        | 'symbol                         a symbol (special case of s-expr)
        | (pat_1 ... pat_n)               list of n elements
        | (pat_1 ... pat_n . pat_{n+1})   list of n or more
        | (pat_1 ... pat_n pat_n+1 ooo)   list of n or more, each element
                                          of remainder must match pat_n+1
        | #(pat_1 ... pat_n)              vector of n elements
        | #(pat_1 ... pat_n pat_n+1 ooo)  vector of n or more, each element
                                          of remainder must match pat_n+1
        | #&pat                           box
        | ($ record-name pat_1 ... pat_n) a record
        | (= field pat)                   a ``field'' of an object
        | (and pat_1 ... pat_n)           if all of pat_1 thru pat_n match
        | (or pat_1 ... pat_n)            if any of pat_1 thru pat_n match
        | (not pat_1 ... pat_n)           if all pat_1 thru pat_n don't match
        | (? predicate pat_1 ... pat_n)   if predicate true and all of
                                          pat_1 thru pat_n match
        | (set! identifier)               anything, and binds setter
        | (get! identifier)               anything, and binds getter
        | `qp                             a quasi-pattern
        | (identifier *** pat)            matches pat in a tree and binds
        identifier to the path leading
        to the object that matches pat

  ooo ::= ...                             zero or more
        | ___                             zero or more
        | ..1                             1 or more

  quasi-patterns:                 matches:

  qp  ::= ()                              the empty list
        | #t                              #t
        | #f                              #f
        | string                          a string
        | number                          a number
        | character                       a character
        | identifier                      a symbol
        | (qp_1 ... qp_n)                 list of n elements
        | (qp_1 ... qp_n . qp_{n+1})      list of n or more
        | (qp_1 ... qp_n qp_n+1 ooo)      list of n or more, each element
                                          of remainder must match qp_n+1
        | #(qp_1 ... qp_n)                vector of n elements
        | #(qp_1 ... qp_n qp_n+1 ooo)     vector of n or more, each element
                                          of remainder must match qp_n+1
        | #&qp                            box
        | ,pat                            a pattern
        | ,@pat                           a pattern

The Racket pattern matcher

The Racket pattern matcher is typically shipped with the distribution of Racket. In order to use it, one needs to (require racket/match) in their Racket program. It can also be found in the Racket's GitHub repository.

Simple patterns

The rules for simple patterns in the Racket pattern matcher are identical with those described above for the Wright-Cartwright-Shinn pattern matcher: literals match literals, symbols match and bind to anything, and the special symbol _ (underscore) matches anything but binds to nothing.

Compound patterns

The rules for compound patterns in the Racket pattern matcher differ significantly compared to the compound patterns found in the Wright-Cartwright-Shinn matcher. In particular, compound patterns must be proper lists, and their first element must always be a special symbol understandable by the matcher. For example, if we want to match a pair (a . b), we need to use the pattern (cons a b); if we want to match a list (a b), we need to use the pattern (list a b) and so forth.

Therefore, the unique function defined in the previous section could be translated to the Racket's pattern language in the following way:

  (define (unique elements)
    (match elements
      ((list-rest x x rest)
       (unique (cons x rest)))
      ((cons x rest)
       (cons x (unique rest)))
      ('()
       '())))

Overall, the documentation of Racket provides a detailed list of symbols that are allowed in the head position of compound patterns.

The complete grammar for the Racket pattern matcher

  pat     ::= id                                match anything, bind identifier
           |  (var id)                          match anything, bind identifier
           |  _                                 match anything
           |  literal                           match literal
           |  (quote datum)                     match equal? value
           |  (list lvp ...)                    match sequence of lvps
           |  (list-rest lvp ... pat)           match lvps consed onto a pat
           |  (list-no-order pat ...)           match pats in any order
           |  (list-no-order pat ... lvp)       match pats in any order
           |  (vector lvp ...)                  match vector of pats
           |  (hash-table (pat pat) ...)        match hash table
           |  (hash-table (pat pat) ...+ ooo)   match hash table
           |  (cons pat pat)                    match pair of pats
           |  (mcons pat pat)                   match mutable pair of pats
           |  (box pat)                         match boxed pat
           |  (struct-id pat ...)               match struct-id instance
           |  (struct struct-id (pat ...))      match struct-id instance
           |  (regexp rx-expr)                  match string
           |  (regexp rx-expr pat)              match string, result with pat
           |  (pregexp px-expr)                 match string
           |  (pregexp px-expr pat )            match string, result with pat
           |  (and pat ...)                     match when all pats match
           |  (or pat ...)                      match when any pat match
           |  (not pat ...)                     match when no pat matches
           |  (app expr pats ...)               match (expr value) output values to pats
           |  (? expr pat ...)                  match if (expr value) and pats
           |  (quasiquote qp)                   match a quasipattern
           |  derived-pattern                   match using extension

  literal ::= #t                                match true
           |  #f                                match false
           |  string                            match equal? string
           |  bytes                             match equal? byte string
           |  number                            match equal? number
           |  char                              match equal? character
           |  keyword                           match equal? keyword
           |  regexp literal                    match equal? regexp literal
           |  pregexp literal                   match equal? pregexp literal

  lvp     ::= (code:line pat ooo)               greedily match pat instances
           |  pat                               match pat

  qp      ::= literal                           match literal
           |  id                                match symbol
           |  (qp ...)                          match sequences of qps
           |  (qp ... . qp)                     match qps ending qp
           |  (qp ooo . qp)                     match qps beginning with repeated qp
           |  #(qp ...)                         match vector of qps
           |  #&qp                              match boxed qp
           |  #s(prefab-key qp ...)             match prefab struct with qp fields
           |  ,pat                              match pat
           |  ,(list lvp ...)                   match lvps, spliced
           |  ,(list-rest lvp ... pat)          match lvps plus pat, spliced
           |  ,'qp                              match list-matching qp, spliced

  ooo     ::= ...                               zero or more; ... is literal
           |  ___                               zero or more
           |  ..k                               k or more
           |  __k                               k or more

A closer look at the rules reveals that some of the symbols allowed in patterns coincide with the ones in the Wright-Cartwright-Shinn matcher. In particular, quote, and, or, not, ?, and quasiquote.

Pattern guards

The Racket matcher also supports the feature known as pattern guards. A pattern guard is an additional condition which needs to be satisfied in order for a match to be considered successful.

In Racket, pattern guards are expressed using the #:when keyword following a pattern, followed by the condition. For example:

  (match '(1 2)
    (`(,a ,b) #:when (odd? (+ a b)) 'odd-sum)
    (_ 'unknown))

Derived patterns

The distinctive feature of the Racket matcher is the ability to provide derived patterns. In addition to the match form, Racket provides the define-match-expander extension mechanism, which allows one to define new types of patterns.

The Bigloo pattern matcher

The Bigloo pattern matcher is distributed along with the Bigloo scheme implementation, but it is also provided by STKlos. According to the, it supports the following syntax:

<pattern> ⇒                Matches:

  <atom>                    the <atom>.
| (kwote <atom>)            any expression eq? to <atom>.
| (and <pat1> ... <patn>)   if all of <pati> match.
| (or <pat1> ... ...<patn>) if any of <pat1> through <patn> matches.
| (not <pat>)               if <pat> doesn't match.
| (? <predicate>)           if <predicate> is true.
| (<pat1> ... <patn>)       a list of n elements. Here, ... is a
                            meta-character denoting a finite repetition
                            of patterns.
| <pat> ...                 a (possibly empty) repetition
                            of <pat> in a list.
| #(<pat> ... <patn>)       a vector of n elements.
| #{<struct> <pat> ... }    a structure.
| ?<id>                     anything, and binds id as a variable.
| ?-                        anything.
| ??-                       any (possibly empty) repetition of anything
                            in a list.
| ???-                      any end of list.

 

The most apparent differences between Bigloo and the two matchers described above are the following:

The Gerbil pattern matcher

According to the documentation, the pattern matcher bundled with Gerbil Scheme supports the following syntax:

(match expr
  (pattern body ...) ...
  [(else body ...)])

<pattern>:
 (? test)                          ; predicate test with the `?` predicate constructor
 (? test pattern)                  ; test and match a pattern
 (? test => pattern)               ; test and match a pattern on the value of the test
 (? test :: proc => pattern)       ; test and match with a filter
 (and pattern ...)                 ; match all patterns
 (or pattern ...)                  ; match any pattern
 (not pattern)                     ; negated match
 (cons pattern1 pattern2)          ; destructure a pair like cons
 (cons* pattern ... pattern-tail)  ; destructure a list like cons*
 [pattern ...]                     ;
 (@list pattern ...)               ; destructure a list like @list
 (box pattern)                     ;
 #&pattern                         ; destructure a box
 (values pattern ...)              ; destructure a values tuple
 (vector pattern ...)              ;
 #(pattern ...)                    ; destructure a vector
 (struct-id pattern ...)           ; destructure a struct
 (class-id (slot pattern) ...)     ; destructure a class
 (eq? val)                         ; match eq? to val
 (eqv? val)                        ; match eqv? to val
 (equal? val)                      ; match equal? to val
 (quote expr)                      ; match eq?/eqv?/equal? to a quoted value
 (quasiquote datum)                ; destructure with quasiquote
 (apply getf pattern)              ; applicative destructuring
 (match-macro arg ...)             ; apply match macro expander
 _                                 ; match any and ignore
 id                                ; match any and bind to id
 datum                             ; match eq?/eqv?/equal? to a datum

(match <> (match-pattern body ...) ...)
=> (lambda (obj) (match obj (match-pattern body ...) ...))

(match <...> (match-pattern body ...) ...)
=> (lambda args (match args (match-pattern body ...) ...))

The syntax is therefore largely compatible with the Wright-Cartwright-Shinn matcher and the Racket matcher. One noteworthy difference is the presence of the @list pattern: it stems from the fact that the Gerbil reader treats the square bracket lists as lists beginning with the @list symbol. (Some Scheme implementations, such as Kawa or - with appropriately set read-options - Guile - use the $bracket-list$ symbol for the same purpose, although none of this is R6RS-compliant.)

Outside of the scope of the match syntax, @list is defined as a macro similar to quasiquote, but such that each of its arguments is implicitly unquoted, and the literal symbol ... appearing in @listed object's post-fix position plays a role analogous to unquote-splicing.

Therefore, in the context of the match macro, the @list keyword is defined to mirror that behavior, just like the presence of quasiquote in the Wright-Cartwright-Shinn and Racket matcher's pattern context mirrors the behavior of quasiquote in the regular expression context.

The Friedman-Hilsdale-Dybvig pattern matcher

A copy of the pattern matcher developed at Indiana University by Dan Friedman, Erik Hilsdale and Kent Dybvig can be found on GitHub.

The basic syntax of this pattern matcher is similar to the syntax of the Wright-Cartwright-Shinn matcher, the key difference being that in the case of Friedman-Hilsdale-Dybvig matcher, the patterns are implicitly quasi-quoted.

This means, that in order to adapt the basic patterns of the Friedman-Hilsdale-Dybvig matcher to the Wright-Cartwright-Shinn matcher, one needs to wrap each pattern in a quasiquote. In other words, assuming that the Wright-Cartwright-Shinn matcher is available under the match/WCS binding, then the basic syntax of the Friedman-Hilsdale-Dybvig matcher can be achieved via the following transformation:

  (define-syntax match/FHD
    (syntax-rules ()
      ((match/FHD (pattern . body) ...)
       (match/WCS ((quasiquote pattern) . body) ...))))

Consequently, the definition of the unique function would take the following form under the Friedman-Hilsdale-Dybvig match syntax (note the lack of back-ticks):

  (define (unique elements)
    (match elements
      ((,x ,x . ,rest)
       (unique `(,x . ,rest)))
      ((,x . ,rest)
       `(,x . ,(unique rest)))
      (()
       '())))

Similarly, if the Friedman-Hilsdale-Dybvig matcher is available under the match/WCS binding, then we can obtain the subset of Wright-Cartwright-Shinn patterns that begin with quasiquote via the following transformation:

  (define-syntax match/WCS
    (syntax-rules (quasiquote)
      ((match/WCS ((quasiquote pattern) . body) ...)
       (match/FHD (pattern . body) ...))))

Of course, this only works for basic patterns, and doesn't include extensions like predicates or Boolean combinations.

It is worth noting that the same basic quasiquotation patterns are also supported by the Racket pattern matcher.

Pattern guards

Similarly to the Racket's matcher, the Friedman-Hilsdale-Dybvig matcher supports pattern guards. However, it does not require the #:when keyword. Instead, it assumes that there are no <actions-X*> forms, and that a clause for the match macro can either have the form

  (<pattern> <value>)
or
  (<pattern> <guard> <value>)

Grammar rules for the Friedman-Hilsdale-Dybvig matcher

The documentation provides the following syntax for patterns
  Pat    ::= (Pat ... . Pat)
           | (Pat . Pat)
           | ()
           | #(Pat* Pat ... Pat*)
           | #(Pat*)
           | ,Id
           | ,[Id*]
           | ,[Cata -> Id*]
           | Id

Other pattern-matching libraries

The syntax-case/syntax-rules pattern language

Apart from the libraries discussed in this document, the core Scheme language already provides some pattern-matching capabilities, namely the syntax-rules/syntax-case pattern language.

Unfortunately, the applicability of this language is limited to syntax objects, which are used mainly for expressing Scheme code transformations (such as macro definitions). This limitation is hurtful to the souls of those Scheme programmers who are in love with minimalism, as it requires the duplication of the same feature in different context. However, fixing that flaw is beyond the scope of this document.

Structural pattern matching

This document focuses on libraries which extend the syntax of Scheme with pattern-matching capabilities. It is perhaps worth mentioning that the term "pattern matching" is also sometimes used in a slightly different sense: not as a syntactic extension to a language, but as a function that takes an s-expression representing a pattern and some object, and returns a list of bindings (for example, in the form of an association list) that could turn a pattern into that object.

This kind of a pattern-matching library has been described, for example, in the fifth chapter of Peter Norvig's book The Paradigms of Artificial Intelligence Programming.

Recommendations

Since pattern matching is a feature that has already been widely adapted by various Scheme communities, this document does not attempt to subvert this adaption. Instead, it aims at providing recommendations for writing future Scheme code that maximizes compatibility between the existing pattern matchers.

Quasi-quotation is the preferred way of expressing patterns

It has been noted that compound patterns that begin with quasiquote are compatible between the Wright-Cartwright-Shinn matcher, the Racket matcher and the Gerbil matcher, and that the Friedman-Hilsdale-Dybvig matcher can trivially be adapted to make quasiquotations explicit (although - as it was shown by Marc Nieper-Wißkirchen on this SRFI's mailing list, it is impossible to express certain patterns in a portable manner - in particular, patterns that are meant to match the literal quasiquote symbol need to be expressed differently).

This document recommends that the compound patterns are expressed using explicit quasiquotation.

The additional advantage of this practice is that it shows a symmetry between quasiquotation and pattern matching.

In particular, it recommends sticking to the following subset of the pattern grammar, which is compatible with Gerbil, Racket and Wright-Cartwright-Shinn matcher:

  pat ::= identifier                      anything, and binds identifier
        | _                               anything
        | ()                              the empty list
        | #t                              #t
        | #f                              #f
        | string                          a string
        | number                          a number
        | character                       a character
        | 'sexp                           an s-expression
        | 'symbol                         a symbol (special case of s-expr)
        | `qp                             a quasi-pattern

  quasi-patterns:                 matches:

  qp  ::= ()                              the empty list
        | #t                              #t
        | #f                              #f
        | string                          a string
        | number                          a number
        | character                       a character
        | identifier                      a symbol
        | (qp_1 ... qp_n)                 list of n elements
        | (qp_1 ... qp_n . qp_{n+1})      list of n or more
                                          of remainder must match qp_n+1
        | ,pat                            a pattern

A note concerning incompatible features

This document discourages the use of certain pattern-matching features, in particular:

More specifically, the uses of those extensions are obviously allowed, but using them makes a program non-SRFI-200-compliant, and thus unportable between different pattern matchers.

This document also doesn't recommend beginning the names of identifier patterns with the ? (question mark) character, as the Bigloo matcher does, because it is impossible to distinguish particular systematic forms of symbols in syntax-rules macros. The compatibility with the Bigloo matcher and other matchers is unattainable anyway, because the identifiers that appear in patterns need to be stripped off of the initial question mark in order to refer to the objects bound by those identifiers.

Implementations

Three of the five pattern matchers presented here already support the pattern language recommended by this SRFI, and one comes close. Only the Bigloo matcher is almost completely incompatible with this specification.

The Wright-Cartwright-Shinn pattern matcher implemented in terms of syntax-rules macros is available from Alex Shinn's website.

The Racket pattern matcher is distributed with Racket, and is available from Racket's GitHub repository.

The code for the Gerbil matcher can be found in the Gerbil repository,

Save for a few problematic patterns, the Friedman-Hilsdale-Dybvig matcher can be adapted to the pattern language by defining a macro matchWCS

  (define-syntax matchWCS
    (syntax-rules (quasiquote)
      ((_ ((quasiquote pattern) . body) ...)
       (match (pattern . body) ...))))

and re-exporting it to shadow the match binding provided by the Friedman-Hilsdale-Dybvig matcher.

A copy of the pattern matcher developed at Indiana University by Dan Friedman, Erik Hilsdale and Kent Dybvig can be found on GitHub.

Lastly, one could use a simple reference implementation provided by this SRFI. The main interface to the matcher is the match macro, which valuates the expression to be pattern-matched on, and passes the work to the match/evaluated macro:

(define-syntax-rule (match expression (pattern actions* ... value) ...)
  (let ((evaluated expression))
    (match/evaluated evaluated (pattern actions* ... value) ...)))

The match/evaluated macro analyzes the first clause and passes itself recursively on the remaining clauses to the failure continuation of a match-clause macro. If there are no clauses left, it signals an error (which is an acceptable way of handling this situation, although not required):

(define-syntax match/evaluated
  (syntax-rules ()
    ((match/evaluated value)
     ;; This behavior is unspecified, and an "unspecified"
     ;; value would also be fine here.
     (error 'no-matching-pattern))

    ((match/evaluated value (pattern actions ...) . clauses)
     (match-clause ((pattern value))
                   (and)
                   ()
                   actions ...
                   (match/evaluated value . clauses)))))

The match-clause macro is responsible for converting the pattern into a condition (which is a conjunction of all conditions stemming from the pattern's sub-patterns) and bindings.

It maintains (and processes) a list of (pattern path) pair, where path is an expression consisting of nested cars and cdrs leading to the (evaluated) expression being pattern-matched.

It also carries around the set of actions to be performed when the match succeeds, as well as the form to be evaluated when it does not succeed.

When the match-clause encounters a nested pattern, i.e. a pair consisting of left and right sub-patterns, it extends the condition with the requirement that the current target is a pair?, and it expands the list with the left and right sub-patterns.

When the match-clause encounters an identifier, then it extends the set of bindings with the current path.

Otherwise, when it encounters a literal value, it adds a check whether the object at current path is equal? to that value.

When the list is processed, then technically the macro could expand to the form (if condition (let bindings actions ...) alternative), but since the matcher should handle repeated identifiers (with the requirement that they all point to fragments of data that is equal?), it expands to the check/unique macro, whose responsibility is to add necessary equality checks, and remove bindings with duplicate identifiers.

(define-syntax match-clause
  (lambda (stx)
    (syntax-case stx (quote quasiquote unquote and _)
      ((match-clause () condition bindings actions ... alternative)
       #'(check/unique condition bindings #f () () actions ... alternative))

      ((match-clause ((`,pattern root) . rest)
                     condition
                     bindings
                     actions ... alternative)
       #'(match-clause ((pattern root) . rest)
                       condition
                       bindings
                       actions ... alternative))

      ((match-clause ((_ root) . rest)
                     condition
                     bindings
                     actions ... alternative)
       #'(match-clause rest
                       condition
                       bindings
                       actions ... alternative))

      ((match-clause ((variable root) . rest)
                     condition
                     bindings
                     actions ... alternative)
       (identifier? #'variable)
       #'(match-clause rest
                       condition
                       ((variable root) . bindings)
                       actions ... alternative))

      ((match-clause (('datum root) . rest)
                     (and conditions ...)
                     bindings
                     actions ... alternative)
       #'(match-clause rest
                       (and conditions ... (equal? root 'datum))
                       bindings
                       actions ... alternative))

      ((match-clause ((`(left . right) root) . rest)
                     (and conditions ...)
                     bindings
                     actions ... alternative)
       #'(match-clause ((`left (car root)) (`right (cdr root)) . rest)
                       (and conditions ... (pair? root))
                       bindings
                       actions ... alternative))

      ((match-clause ((literal root) . rest)
                     (and conditions ...)
                     bindings
                     actions ...)
       #'(match-clause rest
                       (and conditions ... (equal? literal root))
                       bindings
                       actions ...))
      )))

As mentioned earlier, the check/unique macro is responsible for extending condition with equality checks if there is more than one instance of a given identifier, and to remove bindings with duplicate identifiers.

In addition to bindings, it maintains a currently processed (variable path) binding (or the #f value in the edge cases), a list called bindings/checked, consisting of all bindings that didn't contain the identifier currently being processed, and a list bindings/final, consisting of all bindings that were already tested against duplicates.

(define-syntax check/unique
  (lambda (stx)
    "add equality checks for repeated identifiers in patterns and remove them from bindings"
    (syntax-case stx (and)
      ((check/unique condition () #f () bindings actions ... alternative)
       #'(if condition
             (let bindings actions ...)
             alternative))

      ((check/unique condition
                     ((variable path) . bindings)
                     #f
                     bindings/checked
                     bindings/final
                     actions ... alternative)
       #'(check/unique condition
                       bindings
                       (variable path)
                       bindings/checked
                       bindings/final
                       actions ... alternative))

      ((check/unique (and conditions ...)
                     ((variable path) . bindings)
                     (variable+ path+)
                     bindings/checked
                     bindings/final
                     actions ... alternative)
       (bound-identifier=? #'variable #'variable+)
       #'(check/unique (and conditions ... (equal? path path+))
                       bindings
                       (variable+ path+)
                       bindings/checked
                       bindings/final
                       actions ... alternative))

      ((check/unique conditions
                     ((variable path) . bindings)
                     (variable+ path+)
                     bindings/checked
                     bindings/final
                     actions ... alternative)
       #'(check/unique conditions
                       bindings
                       (variable+ path+)
                       ((variable path) . bindings/checked)
                       bindings/final
                       actions ... alternative))

      ((check/unique conditions
                     ()
                     (variable path)
                     bindings/checked
                     bindings/final
                     actions ... alternative)
       #'(check/unique conditions
                       bindings/checked
                       #f
                       ()
                       ((variable path) . bindings/final)
                       actions ... alternative))
      )))

Implementation in terms of syntax-rules

The above definition of match-clause only uses syntax-case macros because it needs to use the identifier? predicate in one pattern guard. Likewise, the definition of the check/unique needs to use the bound-identifier=? predicate in one pattern guard.

The latter could be easily avoided if the syntax-rules pattern language allowed to repeat identifiers in patterns. The case

      ((check/unique (and conditions ...)
                     ((variable path) . bindings)
                     (variable+ path+)
                     bindings/checked
                     bindings/final
                     actions ... alternative)
       (bound-identifier=? #'variable #'variable+)
       #'(check/unique (and conditions ... (equal? path path+))
                       bindings
                       (variable+ path+)
                       bindings/checked
                       bindings/final
                       actions ... alternative))

could then be rewritten as

      ((check/unique (and conditions ...)
                     ((variable path) . bindings)
                     (variable path+)
                     bindings/checked
                     bindings/final
                     actions ... alternative)
       #'(check/unique (and conditions ... (equal? path path+))
                       bindings
                       (variable path+)
                       bindings/checked
                       bindings/final
                       actions ... alternative))

and the whole macro could be easily transformed to syntax-rules

Both singularities could also be resolved in an extended variant of syntax-rules that permits pattern guards.

However, it is also possible to avoid the need for pattern guards by applying a few tricks devised by Oleg Kiselyov.

The identifier? predicate can be replaced with the following identifier/literal macro, which expands to its second argument if its first argument is an identifier, and otherwise expands to its third argument (this style of writing macros is broadly called "CPS macros", because it resembles the Continuation-Passing Style).

(define-syntax identifier/literal
  (syntax-rules ()
    ((identifier/literal atom identifier literal)
     (let-syntax ((check-identifier (syntax-rules ()
                                      ((_ atom symbol _) symbol)
                                      ((_ datum _ value) value))))
       (check-identifier raw-symbol identifier literal)))))

The key trick is the definition of a local macro (using the let-syntax form) and then calling this locally defined macro with some unbound identifier (in our case we chose the name raw-symbol).

As the local macro uses the atom argument, the first rule matches against the raw-symbol only if atom is also an identifier - then the expression will be replaced by the successful branch. Otherwise (e.g. if atom is a list, or a number, or a boolean etc.) the first rule will fail, and the second rule will be tested which always matches, and the expression will be replaced with the failure branch.

The second guard, consisting of the bound-identifier=? predicate can be replaced using a very similar trick, but this time the macro will take two arguments, and - as before - two continuation arguments:

(define-syntax same-variable
  (syntax-rules ()
    ((_ x y same differ)
     (let-syntax ((x= (syntax-rules (x)
                        ((_ x identical _) identical)
                        ((_ z _ different) different))))
       (x= y same differ)))))

The code is almost exactly the same, with two important differences: first, the identifier x (first argument) appears on the local macro's literals' list; second, instead of passing an unbound symbol, we now pass the second argument of our macro, namely - y - to the inner macro.

The first pattern of the inner macro will match only if the second argument is the same identifier as the first one; otherwise the second rule succeeds which matches anything.

Those two macros allow us to construct the definitions for match-clause and check/unique exclusively in terms of simple syntax-rules transformations:

(define-syntax match-clause
  (syntax-rules (quote quasiquote unquote and _)

    ((match-clause () condition bindings actions ... alternative)
     (check/unique condition bindings #f () () actions ... alternative))

    ((match-clause ((`,pattern root) . rest)
                   condition
                   bindings
                   actions ... alternative)
     (match-clause ((pattern root) . rest)
                   condition
                   bindings
                   actions ... alternative))

    ((match-clause ((_ root) . rest)
                   condition
                   bindings
                   actions ... alternative)
     (match-clause rest
                   condition
                   bindings
                   actions ... alternative))

    ((match-clause (('datum root) . rest)
                   (and conditions ...)
                   bindings
                   actions ... alternative)
     (match-clause rest
                   (and conditions ... (equal? root 'datum))
                   bindings
                   actions ... alternative))

    ((match-clause ((`(left . right) root) . rest)
                   (and conditions ...)
                   bindings
                   actions ... alternative)
     (match-clause ((`left (car root)) (`right (cdr root)) . rest)
                   (and conditions ... (pair? root))
                   bindings
                   actions ... alternative))

    ((match-clause ((atom root) . rest)
                   (and conditions ...)
                   bindings
                   actions ... alternative)
     (identifier/literal
      atom
      (match-clause rest
                    (and conditions ...)
                    ((atom root) . bindings)
                    actions ... alternative)
      (match-clause rest
                    (and conditions ... (equal? atom root))
                    bindings
                    actions ... alternative)))
    ))

(define-syntax check/unique
  (syntax-rules (and)
    ((check/unique condition () #f () bindings actions ... alternative)
     (if condition
         (let bindings actions ...)
         alternative))

    ((check/unique condition
                   ((variable path) . bindings)
                   #f
                   bindings/checked
                   bindings/final
                   actions ... alternative)
     (check/unique condition
                   bindings
                   (variable path)
                   bindings/checked
                   bindings/final
                   actions ... alternative))

    ((check/unique (and conditions ...)
                   ((variable path) . bindings)
                   (variable+ path+)
                   bindings/checked
                   bindings/final
                   actions ... alternative)
     (same-variable variable variable+
                    (check/unique (and conditions ... (equal? path path+))
                                  bindings
                                  (variable+ path+)
                                  bindings/checked
                                  bindings/final
                                  actions ... alternative)
                    (check/unique (and conditions ...)
                                  bindings
                                  (variable+ path+)
                                  ((variable path) . bindings/checked)
                                  bindings/final
                                  actions ... alternative)))
      ((check/unique conditions
                     ()
                     (variable path)
                     bindings/checked
                     bindings/final
                     actions ... alternative)
       (check/unique conditions
                     bindings/checked
                     #f
                     ()
                     ((variable path) . bindings/final)
                     actions ... alternative))
      ))

Acknowledgments

This work has provoked the emergence of SRFI-204, which, due to its much greater scope and ambition, has never been finalized.

However, a lot of excellent work done by the author of SRFI-204, Felix Thibault, has been (indirectly) incorporated into this document. Felix' work prompted sections on Bigloo and Gerbil matchers, and most references werw borrowed from his document.

The author's first exposure to pattern matching in Scheme occurred through the Shinn matcher that was shipped with Guile, along with excellent documentation. The author is therefore grateful to both Alex Shinn and the Guile team (at that time consisting of Mark H. Weaver, Ludovic Courtes and Andy Wingo).

Alex Shinn deserves additional credit for explaining to me the tricks invented by Oleg Kiselyov that allow to express in syntax-rules things that otherwise would only be expressible using pattern guards.

I am also grateful to everybody who contributed their time and effort on this SRFI's mailing list. In particular, I was constantly impressed by both the involvement and the depth of analysis shown by Marc Nieper-Wißkirchen and Shiro Kawai.

This work probably would never see the daylight if John Cowan didn't encourage me to participate in the SRFI process, and if Arthur Gleckler hadn't decided to carry on the torch of animating the SRFI process. He's probably the most patient SRFI editor the world could ever imagine.

Of course, it would not be possible to write a review of existing pattern matchers if they weren't existing, so this document owes its own existence to their respective creators.

Copyright

Copyright © Panicz Maciej Godek (2020-2022).

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


Editor: Arthur A. Gleckler