262: Extensible pattern matcher

by Daphne Preston-Kendal

Status

This SRFI is currently in draft status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-262@nospamsrfi.schemers.org. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.

Abstract

A pattern matching form which can operate on arbitrary Scheme values is defined. It conforms to the following design principles.

The syntax of patterns is declarative. The syntax is extensible in the same way that Scheme’s procedural syntax is extensible by macros.

For most use cases, the use of the pattern matcher should produce code which is essentially as efficient at run time as equivalent procedural Scheme code would be, assuming a moderately optimizing Scheme implementation. This applies only when the equivalent code is also equivalently correct in terms of handling of error cases (i.e. including all type checks done automatically by the pattern matcher). However, using extension syntax should not cause this principle to be violated (provided the extension syntax is appropriately implemented).

Table of Contents

  1. Rationale
    1. Early history
    2. Development of the ‘Wright’ pattern matcher
    3. Macros as the means of extension
    4. Comparison to Wright-style pattern matchers
  2. Issues
  3. Specification
    1. Scheme pattern matching syntax
    2. Built-in primitive pattern syntax
    3. Built-in derived pattern syntax
    4. Defining new pattern syntax
    5. Condition type
  4. Further examples
    1. General record matching syntax
    2. Lazy data structures
    3. Views
    4. Tree patterns
  5. Design notes
    1. Constructor-based syntax and the means of extension
    2. Non-linear patterns
    3. Application patterns and multiple values
    4. Disjointed variables
    5. Coverage checking
  6. Implementation
    1. Hints for implementers
  7. Acknowledgements

Synopsis

This section is non-normative.

The following table lists the syntax, procedures, and condition type made available in Scheme code by the pattern matching library. The names of the syntactic tokens shown here are mnemonic only – consult the entry for each form to see the full grammar.

Scheme syntax Scheme procedures Condition type
(match expr clause )
(match-values expr mv clause )
(match-let let-like body)
(match-let* let-like body)
(match-let-values mv let-like body)
(match-let*-values mv let-like body)
(match-letrec let-like body)
(match-letrec* let-like body)
(if-match let-like expr1 expr2)
(match-define pat expr)
(match-define-values (pat ) expr)
(match-lambda mv clause )

(define-pattern-syntax id tx expr)
(match-ellipsis? obj)

(make-match-violation)
(match-violation? obj)
&match

The following grammar summarizes the syntax of the pattern-matching sublanguage, including basic and derived pattern syntax which is included with the pattern-matching library itself.

Pattern syntax
a pattern ... ... matches:
pat ::= _ ; anything
| id ; anything, and binds subject to id
| self-quoting datum ; a datum which is equal?
| 'datum ; a datum which is equal?
| (quote datum)
| (cons patcar patcdr) ; a pair
| (list seq pat ) ; a list
| (cons* seq pat  pat) ; a possibly improper list
| (lset pat ) ; a list with items in any order
| (lset pat  pat dots)
| (vector seq pat ) ; a vector
| (? expr pat ) ; if (expr subj) and every pat matches subj
| (apply expr pat ) ; if values of (expr subj) match the pats
| (and pat ) ; if all of the pats match
| (or pat ) ; if any of the pats match
| (not pat) ; if pat doesn’t match
| (seq iter spec seq pat ) ; a sequence
| (seq* iter spec seq pat  pat) ; a possibly improper sequence
| (seq/unordered iter spec pat ) ; a sequence with items in any order
| (seq/unordered iter spec pat  pat dots)
| `quasipat ; according to quasipattern
| (quasiquote quasipat)
| (keyword datum ) ; anything you want :-)
| (keyword datum datum)
seq pat ::= pat ; a single item from a sequence
| pat dotty ; multiple items from a sequence
dotty ::= dots ; zero or more, greedily
| (dots n) ; exactly that many
| (dots n m) ; between n and m, greedily
| (dots n #t) ; at least n, greedily
dots ::= the identifier ...
quasipat ::= self-quoting datum ; a datum which is equal?
| id ; a symbol with the same name
| () ; the empty list
| (seq quasipat ) ; a proper list
| (seq quasipat quasipat) ; an improper list
| #(seq quasipat ) ; a vector
| ,pat ; pattern
| (unquote pat)
seq quasipat ::= quasipat ; as quasipattern
| ,@pat ; any number of pattern
| (unquote-splicing pat)
| quasipat dotty ; multiple items matching quasipattern

Rationale

Scheme’s lack of standard facilities for pattern matching on general Scheme values has been criticized by functional programmers of other languages for at least 35 years (Wadler 1987a). Generally, pattern matching enables a more declarative style of programming closer to mathematical notation, which is easier to reason about and which can handle error cases without having to write explicit error-checking code.

Early history

The very early history of structural pattern matching is tied to its use in early computer algebra systems to find mathematical expressions and equations having particular forms. (Moses 1967) Pattern matching as a programming language feature in its recognizable, modern form – as a means of deconstructing records using syntax mirroring that of their constructors – was first proposed by Burstall (1969), building on work by McCarthy and Painter (1967) in the use of structural induction to prove the correctness of a compilation technique. Burstall proposed pattern matching as an extension to Landin (1966)’s proposed programming language ISWIM, which was never actually implemented. The first implemented generalized pattern matcher for a programming language was by McBride (1970), who modified the Lisp 1.5 metacircular evaluator in the process of developing his own computer algebra system, adding the capability to recognize structures of any kind built with cons cells.

It was another decade before pattern matching in the style proposed by Burstall became a part of the experimental language Hope (Burstall, MacQueen, and Sannella 1980), and subsequently a standard feature of the descendant ML (1973; see MacQueen, Harper, and Reppy 2020) and Miranda/Haskell (1985/1990 respectively; see Hudak et al. 2007) families of programming languages. In Lisp, declarative destructuring was added to MacLisp with the introduction of the now-familiar LET form in 1978 (Steele et al. 1974–82), but in the initial version of Common Lisp this feature survived only in the DEFMACRO argument list; LET was simplified to allow basic variable binding only, as we know it today. (Steele 1984) Early implementations of Common Lisp reintroduced a destructuring binding construct under the name DESTRUCTURING-BIND, which was adopted into the ANSI standard. (Pitman and Moon 1989) In Scheme, a destructure macro providing similar functionality was included in the T implementation from around 1983.1 However, these special forms did not allow the distinctive ML-style use of pattern matching to test an input value against a series of successive patterns and take action according to which pattern the value conforms to; Lisp programmers have generally preferred the use of a series of explicit conditionals in imperative style for this purpose. Moreover, no RnRS report has ever standardized a destructure-like syntactic form.

Development of the ‘Wright’ pattern matcher

However, prompted by Wadler (1987a)’s remarks, early discussions of what became the R4RS briefly included proposals for pattern matching. The majority of mailing list participants rejected pattern matching as a core language feature, and the proposal was relegated to a planned section of the report entitled the ‘Yellow Pages’ (containing portable libraries). The idea was then summarily dismissed at the June 1987 meeting of the RnRS report authors, and the Yellow Pages were never produced.

Schemers have nonetheless developed a variety of pattern matching macros using pre-hygienic and hygienic macro systems. The most popular pattern-matching syntax was originally developed by Bruce F. Duba in the mid-1980s for Scheme 84.2 It became more widely known after Andrew K. Wright adapted, documented, and distributed it for several other Scheme implementations from around 1993 (Wright 1994, 1996). At least three independent re-implementations of the same syntax have followed: Bruce Hauman wrote a version using PLT Scheme’s syntax-case which was integrated into the main distribution in 2003; Alex Shinn implemented it in portable syntax-rules (Shinn 2006) and later made further extensions; and Adam Nelson has implemented it in syntax-rules again as part of his Schemepunk project. This pattern matcher has been included in very many Scheme implementations, both in the original defmacro-based implementation released by Wright, and in Shinn’s later hygienic version which has now almost universally replaced it. Under the name of the Wright–Cartwright–Shinn (WCS) matcher,3 an attempt at giving it a specification was made in SRFI 204.

Other pattern matchers in Scheme include Oleg Kiselyov’s minimal pmatch in syntax-rules and the Chez Scheme match macro implemented using hygienic syntax-case by Dan Friedman, Erik Hilsdale, and Kent Dybvig with contributions from others, which was described in SRFI 241. Another implementation worth mentioning is Bryan O’Sullivan’s simple pattern-case macro, probably the first ever pattern matcher written in portable syntax-rules, which was still an optional appendix to the R4RS at the time. None of these are nearly as widely applicable as Wright-style pattern matchers, however: many patterns which can be expressed in Wright’s syntax would require falling back on writing explicit procedural destructuring code in these systems. All of these examples are limited because their basic pattern forms support only the built-in Scheme datum types. Even in Wright’s pattern matcher, the use of the lexical syntax of datum types to specify deconstruction naturally limits their applicability to Scheme object types which lack lexical syntax. Since the widespread adoption of standard record type systems in Scheme code, and especially their use to implement collections types beyond the standard Scheme lists and vectors, this has become an increasingly pertinent problem.

Macros as the means of extension

An early example which shows the potential of a less intrinsically limited pattern matcher is Christian Queinnec’s ‘Pattern-match compiler for non-linear second-order patterns on S-expressions’ (Queinnec 1990a, 1990b). In implementations in both Scheme and Common Lisp, Queinnec demonstrated the use of macros to extend the syntax of patterns, similarly to how the syntax of normal Lisp code can be extended; the core pattern language is small and not necessarily very tractable by humans, but can be expanded into from more easily comprehensible higher-level pattern forms. The use of macros to define new deconstructors is an adaptation to the Scheme context of the idea of extensible patterns, which is conventionally credited to Wadler (1987b)’s introduction of ‘views’ to the Miranda language, although McBride’s original Lisp 1.5 pattern matcher already featured a version of extensible patterns.

Hauman’s PLT Scheme implementation of Wright’s pattern matching interface also introduced an alternative interface, which cleanly separated the lexical syntax of Scheme from the syntax of deconstructors. In 2005, Sam Tobin-Hochstadt took advantage of this to allow users to define new deconstructors using hygienic macros, an independent reinvention of Queinnec’s concept. This was followed by a clean re-implementation in 2008 using PLT Scheme’s advanced hygienic macro system.4 The resulting pattern matcher – now part of PLT Scheme’s successor, Racket – is described by its author in Tobin-Hochstadt (2011).

Extensibility, combined with other primitives provided by Wright’s original pattern matcher (in particular, pattern types for calling arbitrary Scheme predicate and accessor procedures), allows data types defined by programmers to receive their own first-class support by an existing pattern matcher. Racket’s match inspired the design of a similar extensible pattern matcher in Emacs Lisp called pcase (Monnier and Sperber 2020), as well as the Common Lisp libraries Optima and Trivia.

The pattern matcher proposed here is likewise in the lineage of, and very closely inspired by, the Racket pattern matcher. Its main adaptation is in the use of identifier properties (SRFI 213) as the means of extensibility. The clean extensibility of the pattern matcher avoids the ‘kitchen sink of features’ impression which the featureful but non-extensible syntax of Wright-style matchers makes: the core of the matcher is kept very small with only a handful of basic primitives, but the matcher as a whole is comparably powerful due to the ability to combine these primitives into new matching abstractions.

Comparison to Wright-style pattern matchers

Compared to Wright-style pattern matchers, the pattern matcher proposed here is essentially equivalent in basic computational power, omitting only the following (mis)features:

In terms of expressive power, however, this pattern matcher is significantly better than Wright-style pattern matchers because any Scheme programmer can define new types of pattern rules. These pattern rules can, for example, provide matching functionality for new data types, or improve or tailor the semantics of the matchers included for Scheme’s built-in data types. They are written much like regular macro transformers: in particular, they can usually be entirely specified with the familiar syntax-rules system, but any other means of creating macro transformers may be used. They compose transparently with, and appear (in both syntax and generally in semantics) identical to the pattern rules included with the pattern matcher itself, in the same way normal Scheme macros compose with procedural Scheme code.

Issues

Specification

In the below specification text, the phrase ‘matching failure is signalled’ means that an exception with condition types &match (see below) and &irritants is raised. The field of the &irritants conditions is set to a list of the values which were supposed to match, as specified in each section.

Scheme pattern matching syntax

syntax (match input expression
  clause )

Syntax: Input expression must be an expression.

Each clause has the form (pattern body).

A pattern has the following grammar:

pattern ::= _
| identifier
| pattern syntax
| pattern datum

Pattern syntax has the form (keyword datum ) or (keyword datum . datum). Pattern datum is any value which is a self-evaluating datum in normal Scheme code.

Semantics: The input expression is evaluated to produce a value, and clauses are checked in turn from left to right, until one is found whose patterns matches the value.

Note: In the literature on pattern matching, a distinction is sometimes made between ‘top-to-bottom’ and ‘left-to-right’ pattern matching. The former refers to the relative priority of patterns with respect to one another, and is one of a number of possible strategies for disambiguating between multiple patterns which match the same input value. Under the top-to-bottom rule, which is applied here (and here called ‘left-to-right’ in deference to the usual terminology for Scheme and Lisp forms), disambiguation depends entirely on the order in which patterns appear in the code. (See Hudak et al. 2007 section 5.2 for references to other possible pattern disambiguation strategies.) ‘Left-to-right’, in the terminology of those who make the distinction, refers to the order in which parts of an individual pattern are tested against parts of the input term. This is a practical necessity in lazily evaluated languages, where a non-matching earlier part of a term might prevent the pattern matcher from having to examine a later part whose evaluation might not terminate; but in strict languages such as Scheme and ML there is little practical benefit to making a guarantee on the order in which parts of the input value are tested. Indeed, in order to enable more optimized compilation, this pattern matcher does not make a general ‘left-to-right’ guarantee on the order in which parts of a pattern are tested against input: only certain types of patterns guarantee the order of testing.

When a clause with a matching pattern is found, the body of that clause is evaluated and its result returned. The body is evaluated in tail position if the match expression as a whole is in tail position.

The testing of whether patterns match against input values is formally defined as follows.

The value against which a pattern is tested is called the subject of that pattern. Whether a pattern matches against a particular subject is a boolean property. For example, a pattern which is an underscore (the auxiliary syntax keyword _) matches any subject, whereas a pattern datum will match if its subject is the same (in the sense of equal?) as the pattern datum itself, and will not match otherwise.

In addition, the matching of a pattern against a subject can cause an identifier to be bound as a variable within the evaluation of the body. A pattern which is an identifier other than _ will, like _, match any subject, but will also cause that identifier to be bound as a variable to the subject of the pattern. Variables bound in this way are also called pattern variables, although they are normal Scheme variable bindings and not pattern variables in the sense of syntax-case.

Note: The auxiliary syntax keyword else has no special function in match expressions. A pattern consisting only of the _ identifier can be used as a catch-all to handle the case when no previous pattern matches. Attempting to use the else keyword for this purpose will result in a variable being bound within the body of the corresponding clause, shadowing the else keyword and potentially causing unexpected results when attempting to use that keyword in another cond expression within the clause’s body.

A pattern which is an instance of pattern syntax can contain subpatterns (recursively nested instances of pattern) which have different subjects to the pattern syntax itself; for example, the pattern syntax might match a record instance as its subject, while a subpattern of it might match one of the field values of that record. The keyword of an instance of pattern syntax uniquely determines a particular type of pattern; each such type of pattern can have its own syntax and semantics. An instance of pattern syntax usually matches its subject if some combination of subpatterns contained within it matches a corresponding combination of values derived from the subject, and may bind variables for some or all of these derived values. It is a syntax violation if the keyword of an instance of pattern syntax is not lexically bound to an identifier for a valid type of pattern syntax (see define-pattern-syntax).

It is a syntax violation if any pattern which is an identifier other than _ appears more than once in any pattern, including occurrences within different subpatterns. The exception is identifiers within subpatterns of or patterns, which must appear in each such subpattern in order to be bound as variables within the evaluation of the body (see the description under the entry for that pattern syntax).

If none of the patterns match the result of evaluating the input expression, matching failure is signalled with a one-item list containing the result of the input expression evaluation.

syntax (match-values input expression
  clause )

Syntax: Input expression must be an expression.

Each clause has the form ((pattern ) body).

Semantics: Match-values is similar to match, except that the input expression can evaluate to multiple values, and each value is tested against the respective pattern in an unspecified order. Each clause can have a different number of patterns; only those clauses with the correct number of patterns for the number of values returned from evaluating the input expression will be matched against.

If none of the clauses’ patterns match all of the corresponding values, matching failure is signalled with a list of the values produced by the input expression.

syntax (match-let ((pattern expression) )
  body)

Match-let matches the results of evaluating all the expressions against the corresponding patterns. The patterns must all match the results of the respective values obtained by this evaluation, otherwise matching failure is signalled with a list of the values of the expressions. The evaluation of the expressions takes place in an unspecified order and out of the scope of any of the pattern variables from the patterns; the order in which the resulting values are tested against the corresponding patterns is also unspecified. The body is then evaluated with the pattern variables resulting from the matches being bound to their respective subjects. The body is evaluated in tail position if the match-let expression as a whole is in tail position.

Match-let could be implemented by:

(define-syntax match-let
  (syntax-rules ()
    ((_ ((pat init) ...) body_0 body_1 ...)
     (match-values (values init ...)
       ((pat ...)
        body_0 body_1 ...)))))

syntax (match-let* ((pattern expression) )
  body)

Match-let* is like match-let, except that the expressions are evaluated from left to right, each with the pattern variables bound by the patterns corresponding to the expressions to the left of it in scope.

If any of the patterns do not match the result of evaluating the corresponding expression, matching failure is signalled with a one-item list containing the value which failed to match the corresponding pattern.

Match-let* could be implemented by:

(define-syntax match-let*
  (syntax-rules ()
    ((_ () body_0 body_1 ...)
     (let ()
       body_0 body_1 ...))
    ((_ ((pat init) more ...) body_0 body_1 ...)
     (match-let ((pat init))
       (match-let* (more ...)
         body_0 body_1 ...)))))

syntax (match-let-values (((pattern ) expression) )
  body)

Match-let-values is like match-let, except that the expressions can evaluate to multiple values and must match against the corresponding number of patterns. The order in which the values from the expressions are tested against the patterns is unspecified.

If any of the groups of patterns do not match the result of evaluating the corresponding expression, matching failure is signalled with a concatenated list of all of the values of all of the expressions.

Match-let-values could be implemented by:

(define-syntax match-let-values
  (lambda (stx)
    (syntax-case stx ()
      ((_ (((pat ...) init) ...) body_0 body_1 ...)
       (with-syntax
           ((((temp ...) ...)
             (map generate-temporaries #'((pat ...) ...))))
         #'(let-values
               (((temp ...) init) ...)
             ((match-lambda
                ((pat ... ...) body_0 body_1 ...))
              temp ... ...)))))))

syntax(match-let*-values (((pattern ) expression) )
  body)

Match-let*-values is like match-let-values, except that the expressions are evaluated from left to right, each with the pattern variables bound by the patterns corresponding to the expressions to the left of it in scope. The order in which the values from each expression is tested against the corresponding patterns is unspecified.

If any of the groups of patterns do not match the result of evaluating the corresponding expression, matching failure is signalled with a list containing the values which failed to match the corresponding patterns.

Match-let*-values could be implemented by:

(define-syntax match-let*-values
  (syntax-rules ()
    ((_ () body_0 body_1 ...)
     (let ()
       body_0 body_1 ...))
    ((_ (((pat ...) init) more ...) body_0 body_1 ...)
     (match-let-values (((pat ...) init))
       (match-let*-values (more ...)
         body_0 body_1 ...)))))

syntax (match-letrec ((pattern expression) )
  body)

Match-letrec is like match-let, except that the expressions can recursively refer to the variables bound by the patterns, in the manner of standard Scheme letrec. It is an error if one of the expressions tries to access or assign the value of any of the variables during its evaluation.

Note: The test suite tries to detect whether the Scheme implementation it is running on enforces the letrec restriction by signalling an error, and will automatically skip the test which ensures it is enforced for match-letrec if not. This check assumes that calling eval on a form that violates the letrec restriction will raise some kind of error, which is guaranteed by R6RS but not by R7RS small. If letrec restriction violations do not cause an error to be signalled but rather some other kind of (undefined) behaviour, it may be better to comment out the test when porting an implementation of this SRFI to a new Scheme implementation, rather than relying on this check.

syntax (match-letrec* ((pattern expression) )
  body)

Match-letrec* is like match-let*, except that the expressions can recursively refer to the variables bound by the patterns, and can access and assign the values of pattern variables bound in patterns to the left of their own pattern, in the manner of standard Scheme letrec*.

Match-letrec* could be implemented by:

(define-syntax match-letrec*
  (syntax-rules ()
    ((_ ((pat init) ...) body_0 body_1 ...)
     (let ()
       (match-define pat init) ...
       (let ()
         body_0 body_1 ...)))))

syntax (match-define pattern expression)

Match-define defines the pattern variables specified within the pattern to the corresponding values obtained from matching the pattern against the result of evaluating the expression. If the pattern does not match the result of evaluating the expression, matching failure is signalled with a one-item list containing the value of the expression.

syntax (match-define-values (pattern ) expression)

Match-define-values is like match-define, except that the expression can return multiple values, which are matched against the corresponding patterns. The order in which the values are matched against the corresponding patterns is unspecified. If the patterns do not match the values produced by evaluating the expression, matching failure is signalled with a list of the values.

syntax (if-match ((pattern expression) )
          consequent
          alternate)

Syntax: Consequent and alternate must be expressions.

Semantics: If-match is like match-let, except that if all the patterns match the values obtained by evaluating their corresponding expressions, the consequent expression is evaluated with all the pattern variables in scope; otherwise, the alternate expression is evaluated with none of the pattern variables in scope. The consequent or alternate is evaluated in tail position if the if-match expression as a whole is in tail position.

If-match could be implemented by:

(define-syntax if-match
  (lambda (stx)
    (syntax-case stx ()
      ((_ ((pat init) ...) conseq alter)
       (with-syntax ((else (make-list (length #'(pat ...)) #'_)))
         #'(match-values (values init ...)
             ((pat ...) conseq)
             (else alter)))))))

syntax (match-lambda
  clause )

Syntax: Each clause has the form ((pattern ) body).

Semantics: Match-lambda is the fundamental pattern matching form. It evaluates to a procedure which takes arguments according to the number of patterns in each of the clauses. When the procedure is called, it matches its arguments against each of the clauses which have the same number of patterns from left to right, testing each argument against the corresponding pattern. The order in which the arguments are matched against the patterns of each individual clause is unspecified. The body of the leftmost matching clause is then evaluated with the pattern variables from the pattern in scope. If no clauses have the right number of patterns, or if none of the patterns match all of the respective arguments, matching failure is signalled with a list of the arguments to the procedure call. The body is evaluated in tail position if the call to the returned procedure as a whole is in tail position.

Built-in primitive pattern syntax

The following instances of pattern syntax are exported by the pattern matching library. Where the keywords are existing bindings within the base Scheme library, the pattern syntax transformers are attached to those bindings; otherwise, they are exported as auxiliary syntax keywords.

pattern syntax (quote datum)

A quote pattern matches if its subject is the same as the datum in the sense of equal?, and does not match otherwise. No variable is bound.

Note: For most possible values of datum apart from identifiers (symbols), list-structured forms (including the empty list), and (on strict implementations of the R6RS) vectors, a quote pattern is exactly equivalent to the unquoted datum used as a pattern.

Example:

(define (null-or-something-else obj)
  (match obj
    ('() 'null)
    (_ 'something-else)))

(null-or-something-else '())  ;=> null
(null-or-something-else 'nil) ;=> something-else

pattern syntax (? procedure expression pattern )

The procedure expression is evaluated to produce a procedure of one argument and one return value. A ? pattern tests whether applying this procedure to its subject returns a true value. It is an error if application of the procedure returns zero values or more than one value. If the result of the test is false, the ? pattern does not match.

Otherwise, if there are no patterns, the ? pattern matches its subject and no variable is bound. If there are any patterns, those patterns are matched in an unspecified order against the subject and the ? pattern matches if all of them match the subject of the ? pattern. All of the variables bound by the subpatterns are bound by the match.

Note: It is unspecified when and how many times the procedure expression is evaluated, and when and how many times the resulting procedure is called. In particular, it may be called multiple times on the same subject. The behaviour is undefined if the evaluation of the procedure expression or the subsequent call to the procedure assigns to the binding of any variable used in an expression within any pattern.

If some of the subpatterns need to be checked before others, use an and pattern to enforce the ordering.

Example:

(define (integer-or-symbol val)
  (match val
    ((? integer?) 'integer)
    ((? symbol?) 'symbol)))

(integer-or-symbol 24)  ;=> integer
(integer-or-symbol 'x)  ;=> symbol
(integer-or-symbol "x") ; (exception raised)

pattern syntax (apply procedure expression pattern )

The procedure expression is evaluated to produce a procedure of one argument and as many return values as there are patterns. An apply pattern matches if and only if the values resulting from the application of this procedure to its subject match each of the corresponding patterns: each value becomes the subject of the respective subpattern. The order in which the values are tested against the patterns is unspecified. The behaviour is undefined if application of the procedure to the subject returns more or fewer values than the number of patterns. All of the variables bound by all of the subpatterns are bound by the match.

Note: It is unspecified when and how many times the procedure expression is evaluated, and when and how many times the resulting procedure is called. In particular, it may be called multiple times on the same subject. The behaviour is undefined if the evaluation of the procedure expression or the subsequent call to the procedure assigns to the binding of any variable used in an expression within any pattern.

Example:

(define (fizz? n)
  (match n
    ((apply (lambda (x) (floor/ x 3)) _) #t)
    (_ #f)))

(fizz? 1)  ;=> #f
(fizz? 3)  ;=> #t
(fizz? 5)  ;=> #f
(fizz? 21) ;=> #t

pattern syntax (and pattern )

An and pattern matches if and only if its subject matches all of its subpatterns. All of the variables bound by all of the subpatterns are bound by the match.

Unlike the subpatterns of ? and apply patterns, the subpatterns of an and pattern are matched from left to right and short-circuit. It is therefore safe for later subpatterns to depend on the matching of previous subpatterns for type-checking purposes.

Example:

(define (fizzbuzz n)
  (match n
    ((and (? fizz?) (? buzz?)) 'fizzbuzz)
    ((? fizz?) 'fizz)
    ((? buzz?) 'buzz)
    (_ n)))

(map fizzbuzz (iota 16))
;=> (fizzbuzz 1 2 fizz 4 buzz fizz 7 8 fizz buzz 11 fizz 13 14 fizzbuzz)

pattern syntax (or pattern )

An or pattern matches if its subject matches any of its subpatterns. The variables that are bound to variables correspond to the values from the leftmost subpattern which matches. It is a syntax violation if the code associated with any pattern refers to a variable bound by one or more, but not all, of the subpatterns of an instance of the or pattern syntax.

The subpatterns are matched from left to right and short-circuit. It is therefore safe for later subpatterns to depend on the matching of previous subpatterns for type-checking purposes.

Example:

(define (arithmetic-operation x)
  (match x
    ((list (and operator (or '+ '- '* '/))
           (and operands (? number?)) ...)
     (list operator operands))))

(arithmetic-operation '(+ 2 2))  ;=> (+ (2 2))
(arithmetic-operation '(/ 42 7)) ;=> (/ (42 7))

pattern syntax (not pattern)

A not pattern matches if and only if its subject does not match the given subpattern. No variable is bound.

Example:

(define nonzero?
  (match-lambda
    (((? zero?)) #t)
    ((_) #f)))

pattern syntax (seq name ((var init step) )
    termination condition
    reference expression
  sequence pattern )

Syntax: Name and all the vars must be identifiers. All the inits, steps, and the termination condition and reference expression must be expressions.

Sequence pattern has the following grammar:

sequence pattern ::= pattern
| pattern extended ellipsis
extended ellipsis ::= ellipsis
| (ellipsis n)
| (ellipsis min max)
n ::= an exact nonnegative integer literal
min ::= an exact nonnegative integer literal
max ::= an exact nonnegative integer literal greater than or equal to min
| #t

Ellipsis refers to the ... identifier from the base Scheme library.

Semantics: A seq pattern matches a series of values from an iterable sequence.

A sequence pattern which is a single pattern with no ellipsis following matches one item within the sequence if the corresponding pattern matches that item. Pattern variables within the pattern are bound.

When a pattern is followed by an extended ellipsis it matches multiple items within the sequence, and pattern variables within the pattern list are bound to lists of the values which matched within the input sequence. A pattern followed by single ellipsis can match any number of input forms, including zero. A pattern followed by (ellipsis n) must match exactly n items in the input sequence. A pattern followed by (ellipsis min max) must match at least min items within the input sequence and up to max items, or an unbounded number of items greater than or equal to min items if max is #t.

Ellipsized patterns match against input values greedily – that is, they do not stop matching at the first value which matches the pattern immediately following them; rather, each sequence pattern attempts to consume as much input that matches itself as possible. When multiple ellipsized patterns are included within the list of sequence patterns, the standard leftmost priority rule applies: patterns which are further left in the series of sequence patterns match the largest number of items in the list they can while still satisfying the seq pattern as a whole.

In order to access items from the sequence, the pattern matcher establishes state variables called var by first binding them to the results of evaluating the respective init expressions in an unspecified order. At each stage of the iteration, the pattern matcher first checks to see if the termination condition evaluates to a true value: if so, the sequence is deemed to have completed and the seq pattern matches only if the complete input sequence matches the series of sequence patterns. Otherwise, the pattern matcher evaluates the reference expression and checks the result of the evaluation against one or more of the sequence patterns as appropriate to its own current state. Then it continues the iteration by evaluating the step expressions, binding the results of these evaluations to the respective vars, and beginning a new stage of the iteration.

Within the evaluation of the inits, steps, the termination condition, and the reference expression, the identifier named by name is bound as a variable to the subject of the seq pattern. Within the evaluation of the steps, the termination condition, and the reference expression, the vars are bound to their respective current values. Neither the name nor the vars are bound as pattern variables if the match succeeds, nor are they visible as variables or pattern variables within the sequence patterns.

Note that the pattern matcher may choose to iterate over the input sequence in multiple instances in parallel, to backtrack and consider the same item in the input sequence multiple times, or to terminate the iteration(s) early after concluding that the pattern will never match. The values of the vars must encapsulate the entirety of the state required to access items from the iterable sequence: evaluation of the init, step, reference expression, and termination condition expressions must not mutate the internal state of the seq pattern’s subject such that a parallel evaluation of the same expressions on the sequence would fail, and must not mutate the values of the vars such that restoring previous values in order to backtrack and re-examine an already-examined part of the sequence would not work. As with the expressions embedded in ? and apply patterns, it is undefined behaviour if evaluation of any of these expressions assigns to a variable used in any expression within any pattern.

Note: The structure of an iteration with seq is fundamentally similar the patterns familiar to Scheme programmers from both ‘named let’ and do. A seq iteration can be thought of as expanding to

(let loop ((var init) )
  (if termination condition
      (return-match-or-fail)
      (begin
        (test-current-subpattern-for-input reference expression)
        (loop step ))))

or to

(do ((var init step) )
    (termination condition (return-match-or-fail))
  (test-current-subpattern-for-input reference expression))

Example: Given a type called a ‘vektor’ which is otherwise similar to Scheme’s built-in vector type, the following pattern would match items from this type.

(seq vek ((idx 0 (+ idx 1)))
     (>= idx (vektor-length vek))
     (vektor-ref vek idx)
  #;sequence-subpatterns...)

See also the sample implementation for the vector pattern syntax and the examples under that entry.

pattern syntax (seq* name ((var init step) )
    termination condition
    reference expression
  sequence pattern pattern)

Seq* is similar to seq, with a difference: after the termination condition is satisfied, the reference expression (which should usually simply be one of the vars) is evaluated one more time and its result tested against the final pattern; and if the final pattern has been reached and matched by some previous iteration over the sequence, and that final test fails, the result of that previous match will be the result of matching the whole seq* pattern.

This is the primitive for matching sequences comparable to Scheme’s built-in lists built of pairs which do not guarantee ‘properness’, as well as for matching structures such as series of pairs terminated by a pattern which may itself match a series of pairs. Note, however, that due to the greediness of the ellipsis in sequence patterns, neither seq nor seq* is well-suited to matching sequences from streams or other potentially infinite sequences.

Example: Given the ‘pare’ definition from section 5.5 of the R7RS small report, the following example shows how to match proper ‘lysts’ of ‘pares’, terminated by Scheme’s standard empty list value.

(seq* lyst ((curr list (kdr curr)))
      (not (pare? curr))
      curr
  (apply kar #;subpattern) #;...
  '())

See also the sample implementation for the cons* and list pattern syntax and the examples under those entries.

pattern syntax (seq/unordered name ((var init step) )
    termination condition
    reference expression
  pattern )

pattern syntax (seq/unordered name ((var init step) )
    termination condition
    reference expression
  pattern rest pattern ellipsis)

Syntax: Rest pattern, if given, is a pattern.

Semantics: Seq/unordered is similar to seq, but the values which match the subpatterns may appear in the input sequence in any order. Also, unlike a seq pattern, there can only be one subpattern which matches more than one item from the input sequence: it must be the last subpattern in the seq/unordered pattern, and it must be unbounded.

Seq/unordered patterns are useful for matching set-like or association list-like structures which are not, or which are only partially, inherently ordered. Because of their intended usefulness for association lists, each pattern matches the earliest value in the input sequence which satisfies it while still allowing all of the patterns in the sequence to match if possible. This maintains the historical property of association lists, whereby the leftmost value for a particular key is considered the current value for that key, and further values with the same key are older values which have been overwritten by subsequent consing onto the beginning of the association list.

If the rest pattern is given, it must match any remaining values in the list which do not match any of the given patterns.

All the pattern variables bound by all of the patterns are bound by the match. All of the variables bound by the rest pattern are bound to lists of the corresponding values from the input sequence, in the order they appeared in the original input sequence.

Note: While it is possible to write an implementation of seq/unordered which has good asymptotic performance for the majority of real-world use cases, in general the number of possible orderings of subpatterns and of input values is inherently factorial, and this will be reflected in the time and/or memory space required for some uses. For good performance, as many of the patterns as possible should be unambiguous with respect to one another (i.e. so that no value in the sequence could ever match more than one of the patterns, even if multiple values in the sequence could match the same pattern). The rest pattern may safely be a pattern which is ambiguous with respect to any or all of the other subpatterns, because it is only tried when none of the other subpatterns match.

Examples: See the sample implementation for the lset pattern syntax and the examples under that entry.

Built-in derived pattern syntax

pattern syntax (cons pattern1 pattern2)

A cons pattern matches a pair whose car matches pattern1 and whose cdr matches pattern2. The order in which the car and cdr are tested against the corresponding patterns is unspecified. All pattern variables in both subpatterns are bound by the match. If the subject of the cons pattern is not a pair or either of the patterns does not match, the cons pattern does not match.

Example: The fold example using list-case from SRFI 239 can be reformulated in terms of match as follows.

(define fold
  (lambda (proc seed ls)
    (let f ((acc seed) (ls ls))
      (match ls
        ((cons h t) (f (proc h acc) t))
        ('() acc)
        (_ (assertion-violation 'fold "not a list" ls))))))

Cons could be implemented as pattern syntax by:

(define-pattern-syntax cons
  (syntax-rules ()
    ((_ car-pat cdr-pat)
     (? pair?
        (apply car car-pat)
        (apply cdr cdr-pat)))))

pattern syntax (list sequence pattern )

A list pattern matches a proper list of Scheme values whose elements match the sequence patterns.

Examples:

(match '(1 2 3)
  ((list a b c) (+ a b c))) ;=> 6

(match '(1 2 3 4)
  ((list a b c) (+ a b c))) ; (exception raised)

(match '(1 2 3 . 4)
  ((list a b c) (+ a b c))) ; (exception raised)

(match '(tagged 1 x 2 y)
  ((list 'tagged n ...) n)) ;=> (1 x 2 y)

(match '(x y z 10 11 12)
  ((list (and (? symbol?) syms) ...
         (and (? number?) nums) ...)
   (values syms nums))) ;=> (x y z) (1 2 3)

(match '(1 2 3 split 4 5 6)
  ((list before ... 'split after ...)
   (values before after))) ; (1 2 3) (4 5 6)

(match '(1 2 split 3 4 split 5 6)
  ((list before ... 'split after ...)
   (values before after))) ; (1 2 split 3 4) (5 6)

List could be implemented as pattern syntax by:

(define-pattern-syntax list
  (syntax-rules ()
    ((_ subpat ...)
     (cons* subpat ... '()))))

pattern syntax (cons* sequence pattern pattern)

A cons* pattern matches a proper or improper list of Scheme values. The final pattern matches the tail of the list after all sequence patterns. If the tail pattern matches pairs, it will match the shortest possible proper or improper list; the sequence patterns may be tested for matching against some items within the pairs which actually ultimately form part of the match for the patterns.

Examples:

(match '(1 2 3 . 4)
  ((cons* a b c d) (+ a b c d))) ;=> 10

(match '(1 2 3 4 . 5)
  ((cons* x ... y) (cons y x))) ;=> (5 1 2 3 4)

Cons* could be implemented as pattern syntax by:

(define-pattern-syntax cons*
  (syntax-rules ()
    ((_ seq-pat ... tail-pat)
     (seq* ls ((curr ls (cdr curr)))
           (not (pair? curr))
           curr
       (apply car seq-pat) ...
       tail-pat))))

pattern syntax (vector sequence pattern )

A vector pattern matches a vector whose elements match the sequence patterns.

Examples:

(match '#(1 2 3)
  ((vector a b c) (list a b c))) ;=> (1 2 3)

(match '#(record 1 x 2 y)
  ((vector 'record n ...) n)) ;=> (1 x 2 y)

Vector could be implemented as pattern syntax by:

(define-pattern-syntax vector
  (syntax-rules ()
    ((_ subpat ...)
     (and (? vector?)
          (seq vec ((idx 0 (+ idx 1)))
               (>= idx (vector-length vec))
               (vector-ref vec idx)
            subpat ...)))))

pattern syntax (quasiquote quasipattern)

Syntax: Quasipattern has the following grammar:

quasipattern ::= identifier
| pattern datum
| (sequence quasipattern )
| (sequence quasipattern  . quasipattern)
| #(sequence quasipattern )
| (unquote pattern)
sequence quasipattern ::= quasipattern
| quasipattern extended ellipsis
| (unquote pattern )
| (unquote-splicing pattern )

Semantics: Within a quasipattern, all identifiers match symbols except outside of unquoted (unquote pattern) escapes. In addition, list-structured quasipatterns match literal proper or improper lists rather than being interpreted as pattern syntax. Within a list- or vector-structured quasipattern, unquote-splicing or a following ellipsis can be used to match multiple items from within the list or vector, analogously to a sequence pattern.

pattern syntax (lset pattern )
pattern syntax (lset pattern rest pattern ellipsis)

An lset pattern matches a proper list of Scheme values, testing that items which match the patterns appear in any order within its subject. If the rest pattern is given, it must match any remaining values in the list which do not match the set of patterns.

Note: See the note on performance under seq/unordered.

Examples:

(match '(1 2 3)
  ((lset (? even? x)
         (? odd? y)
         (? odd? z))
   (list x y z))) ;=> (2 1 3)

(match '((a . 1) (b . 2) (c . 3))
  ((lset (cons 'b val) _ ...) val)) ;=> 2

(match '(1 x 2 y)
  ((lset (? symbol? s) more ...)
   (values s more))) ;=> x (1 2 y)

Lset could be implemented by:

(define-syntax lset (syntax-rules ()))

(define-pattern-syntax lset
  (syntax-rules ()
    ((_ subpat ...)
     (and (? list?)
          (seq/unordered ls ((more ls (cdr more)))
                         (null? more)
                         (car more)
            subpat ...)))))

Defining new pattern syntax

syntax (define-pattern-syntax identifier transformer expression)

Syntax: Identifier must be an identifier which already has some kind of binding associated with it. Transformer expression is as for the right hand side of the base Scheme define-syntax.

Semantics: Associates the transformer resulting from evaluating the transformer expression with the identifier. When a pattern syntax use whose keyword is the given identifier appears within a pattern, the matcher will replace that pattern with the result of calling the transformer on the use of pattern syntax. This process is analogous to the normal process of Scheme macro expansion.

Note: The sample implementation only supports transformer procedures as the right-hand side of define-pattern-syntax. Since it is unlikely that anyone will want to use a variable transformer as a pattern syntax transformer, and all known syntax-object-based expander implementations return an ordinary transformer procedure from syntax-rules expressions (albeit this is not guaranteed by the R6RS specification, though it is guaranteed by the current draft of R7RS Large), this is not a problem in practice for any known expander, but future adaptations of the sample implementation may need to specially treat transformer types which are not procedures of the signature Syntax → Syntax.

The behaviour of pattern syntax associations by define-pattern-syntax is equivalent to the behaviour of an identifier property. For example, it is possible to define pattern syntax on an identifier which is not exported from the library where it is defined, in which case the pattern syntax will only be visible inside the library. It is also possible to locally re-define the pattern syntax associated with imported bindings within a library, and this will not affect uses of the pattern syntax in other libraries. Likewise, if pattern syntax is defined or redefined within a local-binding context, the pattern syntax will only be visible within that context. The pattern syntax associated with an identifier is exported and made available under that identifier when the identifier itself is exported from a library. It is an error to import an identifier twice with the same binding from different libraries if they have different pattern syntax associated with them.

Examples: See the examples under the entries in the section on derived pattern syntax.

In general, pattern syntax specialized for matching instances of a record type can be expressed as a combination with and of a ? pattern for that record type’s predicate and apply patterns for each of that record type’s field accessor procedures. The following definition of a two-dimensional point record type will be used as an example.

(define-record-type point
  (make-point x y) point?
  (x point-x)
  (y point-y))

Given this definition, a pattern to match instances of point can be expressed directly as follows:

(match my-location
  ((? point?
      (apply point-x my-location-x)
      (apply point-y my-location-y))
   (show #t
     "I am at X coord: " my-location-x ", Y coord:" my-location-y)))

But this can be more concisely expressed by defining pattern syntax to expand into this form of pattern, as follows:

(define-pattern-syntax point
  (syntax-rules ()
    ((_ x-pat y-pat)
     (? point?
        (apply point-x x-pat)
        (apply point-y y-pat)))))

allowing the previous example to be rewritten more concisely and more expressively:

(match my-location
  ((point my-location-x my-location-y)
   (show #t
     "I am at X coord: " my-location-x ", Y coord:" my-location-y)))

This new pattern can then of course be composed with other pattern syntax:

(define (point-quadrant pt)
  (match pt
    ((point (? positive?) (? positive?)) 'upper-right)
    ((point (? positive?) (? negative?)) 'lower-right)
    ((point (? negative?) (? positive?)) 'upper-left)
    ((point (? negative?) (? negative?)) 'lower-left)
    ((or (point (? zero?) _)
         (point _ (? zero?)))
     (assertion-violation 'point-quadrant "point has a zero coordinate" pt))))
Non-normative: Conventions for pattern syntax identifiers

Record types in Scheme are used as part of the public interface of libraries in at least two ways.

The first is to create true records, representing data about some entity that is relevant to the program’s domain of operation. In this case I suggest that the record type name (in the sense of R6RS and R7RS) should be used as the name of the pattern syntax. The following define-record-type+pattern-syntax macro defines both a record type (in R7RS small style) and pattern syntax for that record type.

(define-syntax define-record-type+pattern-syntax
  (syntax-rules ()
    ((_ name constructor-spec predicate
        (field-name accessor . maybe-setter) ...)
     (begin
       (define-record-type name constructor-spec predicate
                           (field-name accessor . maybe-setter) ...)
       (define-pattern-syntax name
         (syntax-rules ()
           ((_ field-name ...)
            (? predicate
               (apply accessor field-name) ...))))))))

Example usage:

(define-record-type+pattern-syntax measure
  (make-measure magnitude unit) measure?
  (magnitude measure-magnitude)
  (unit measure-unit))

(define (fff->si m)
  (match m
    ((measure n 'furlong)
     (make-measure (* n #e201.168) 'metre))
    ((measure n 'firkin)
     (make-measure (* n #e40.8233133) 'kilogram))
    ((measure n 'fortnight)
     (make-measure (* n 1209600) 'second))))

Further, it may be desirable, at least in some cases, to be able to use a kind of ‘keyword subpattern’ to match the fields of records. The following example extends the above to allow this.

(define-syntax define-record-type+pattern-syntax/keyword
  (syntax-rules ::: ()
    ((_ name constructor-spec predicate
        (field-name accessor . maybe-setter) :::)
     (begin
       (define-record-type name constructor-spec predicate
                           (field-name accessor . maybe-setter) :::)
       (define-pattern-syntax name
         (lambda (stx)
           (syntax-case stx ()
             ((_ subpat ...)
              #`(? predicate
                   #,@(map
                       (lambda (subpat)
                         (syntax-case subpat (accessor :::)
                           ((accessor field-pat)
                            #'(apply accessor field-pat)) :::))
                       #'(subpat ...)))))))))))

Example usage:

(define-record-type+pattern-syntax/keyword journal
  (make-journal title abbreviation) journal?
  (title journal-title)
  (abbreviation journal-abbreviation))

(define journals
  (list (make-journal "Journal of Indo-European Studies"
                      "JIES")
        (make-journal "Zeitschrift für vergleichende Sprachforschung"
                      "KZ")
        (make-journal "Notes and Queries"
                      "NQ")))

(define journal-abbreviations
  (match journals
    ((list (journal (journal-abbreviation abbr)) ...)
     abbr)))

;; journal-abbreviations => ("JIES" "KZ" "NQ")

The other main use of record types in Scheme is simply as a means of creating a disjoint type in order to represent some kind of composite data structure. Recall the existing Scheme convention that the name of the data type itself denotes a constructor which pre-fills a data structure with values passed to the constructor (as opposed to constructors whose name starts with make-, which merely pre-initialize the structure with a certain number of values). The name of this constructor procedure should (as in the examples list, cons, vector, etc.) be associated with pattern syntax which deconstructs the input using syntax essentially similar to that accepted by the constructor procedure itself to maximize symmetry between construction and deconstruction. In some cases it may be necessary to define multiple types of pattern syntax for one data type to match against input in different styles (as in the example of list and cons). It is not possible to give a general example as every data structure’s needs may be different.

Programmers defining pattern syntax for record types which do not fall cleanly into these categories, or some other form of pattern syntax entirely, can simply devise their own conventions.

Non-normative: Robustness of pattern definitions

In general, evaluation of expressions in pattern syntax (the procedure expressions of ? and apply patterns as well as calls to the resulting procedures, and the expressions used in a seq pattern) should have well-defined behaviour, and should never cause an error to be signalled during matching. A pattern should either match its subject or not. The evaluation of expressions which may either cause an error to be signalled or cause undefined behaviour should be guarded by a predicate using a ? pattern and the left-to-right guarantee of the and pattern.

For example, the following would not be a good implementation of a cons pattern:

(define-pattern-syntax bad-cons
  (syntax-rules ()
    ((_ car-pat cdr-pat)
     (and (apply car car-pat)
          (apply cdr cdr-pat)))))

This pattern will misbehave if the pattern matcher attempts to call it on a subject which is not a pair. If the match user intends for an error to be signalled on non-pair input, they should decide this for themselves by allowing the matcher to run out of patterns or providing an explicit case for subjects which are not pairs. The above pattern definition makes uses such as the following impossible, because the pair procedures car and cdr will signal an error and cause the whole match expression to fail before the cases for the empty list and other possible inputs are tried.

(match x
  ((bad-cons a b) 'make-progress)
  ('() 'end-of-proper-list)
  (_ 'improper-list))

Signalling an error during expansion of a pattern if the pattern use does not conform to the syntax of the pattern is of course normal and encouraged.

procedure (match-ellipsis? obj)

Returns #t if obj is a syntax object representing an extended ellipsis as used in a sequence pattern; raises a syntax violation if obj is a syntax object which uses the ... identifier incorrectly for use in a sequence pattern; or returns #f otherwise.

Rationale: Pattern syntax definitions which build upon seq and its relatives may need to pre-process the series of subpattern forms they are given, for example if their own syntax for subpatterns is something other than a simple pattern. In this case, the syntax definition will need to pre-process the series of subpatterns it is given to convert its own special syntax into the syntax of seq, but in doing so should generally treat the extended ellipsis in the series of input subforms specially, passing it directly through to seq.

Such pattern definitions should use match-ellipsis? to detect the extended ellipsis, since the syntax and semantics of extended ellipsis may be extended in the future.

Example: The example of matching a ‘lyst’ of ‘pares’ from above can be expressed as a pattern syntax definition as follows.

(define-pattern-syntax lyst
  (lambda (stx)
    (syntax-case stx ()
      ((_ subpat ...)
       (with-syntax (((seq-subpat ...)
                      (map
                       (lambda (subpat)
                         (if (match-ellipsis? subpat)
                             subpat
                             #`(apply kar #,subpat)))
                       #'(subpat ...))))
       #'(seq* ls ((curr ls (kdr curr)))
               (not (pare? curr))
               curr
           seq-subpat ... '()))))))

Condition type

condition type &match
procedure (make-match-violation)
procedure (match-violation? obj)

This condition type is a subtype of the standard condition type &assertion and has no fields. It could be defined by

(define-condition-type &match &assertion
  make-match-violation match-violation?)

Further examples

General record matching syntax

This pattern matcher does not include by default a pattern type which can match records by their field values without defining new pattern syntax for their record type. This feature is not portable to systems without a record procedural/inspection layer, and may have less than ideal run-time performance in any case. Furthermore, the use of such pattern types can be regarded as an anti-pattern, because they inherently make the field structure of records, which ought to be regarded as an implementation detail of the record type, into part of their public interface (see SRFI 256). It is generally preferable to define specialized pattern syntax for each record type, as demonstrated above.

On Schemes with an R6RS-compatible record system, the $ pattern of Wright-style matching libraries can be brought back roughly as follows:

(define-syntax record (syntax-rules ())) ; dummy binding to hold only
                                         ; pattern syntax

(define-pattern-syntax record
  (lambda (stx)
    (syntax-case stx ()
      ((_ type field-pat ...)
       (with-syntax ((rtd #'(record-type-descriptor type))
                     ((field-n ...) (iota (length #'(field-pat ...)))))
         #'(? (record-predicate rtd)
              (apply (record-accessor rtd field-n) field-pat) ...))))))

Example usage:

(define-record-type point (fields x y))

(define (distance-from-origin pt)
  (match pt
    ((record point x-pos y-pos)
     (sqrt (+ (* x-pos x-pos)
              (* y-pos y-pos))))))

(distance-from-origin (make-point 3 4)) ;=> 5
(distance-from-origin (make-point 4 5)) ;=> 6.4031242...

Aside from the general design issue noted above, the run-time efficiency of this particular pattern syntax is not as good as if the corresponding type had been given its own pattern syntax directly, as e.g. in the define-record-type+pattern-syntax example above, because of the need to invoke record type inspection procedures at run time. A means of expand-time record type inspection, such as that proposed in SRFI 136, could reduce or eliminate this overhead.

The record pattern example also only allows matching fields from one layer of a type hierarchy, following the design of the R6RS record system. Fields from multiple layers have to be extracted by addressing each layer specifically:

(define-record-type point-3d (parent point) (fields z))

(define (point-3d-translate pt dx dy dz)
  (match pt
    ((and (record point x y)
          (record point-3d z))
     (make-point-3d (+ x dx)
                    (+ y dy)
                    (+ z dz)))))

This somewhat reduces the exposure of the field structure of each individual level of the type hierarchy, as does the ability to omit field patterns from the right of the record pattern (so the implementer of record types can still safely add new fields at the right). Still, it is better in general to separate the definition of a record type from the definition of its matching interface.

Lazy data structures

As noted in the entry on match, a pattern matcher designed for a strictly evaluated language with strict data structures (such as this one) can, in many contexts, safely use any order for testing patterns and subpatterns in order to most efficiently rule out non-matching clauses and select the appropriate matching clause. However, Scheme’s first-class procedures and macro system also provide the means to implement lazy data structures such as the streams of SRFI 41. For data structures like these, it is preferable to require left-to-right pattern matching so that the pattern matcher will not look ‘too far ahead’ in a sequence to a point where evaluation will diverge or signal an error.

Achieving this for patterns matching individual instances is simple, though: pattern syntax for a lazy record or data structure must only use the and pattern with its guarantee of left-to-right matching; avoid the subpattern functionality of the ? primitive which does not make any such guarantee; be careful about its use of multiple subpatterns of the apply primitive; and ensure all values are forced as soon as they are accessed by apply patterns. The functionality of the stream-match form of SRFI 41 can be simulated with pattern syntax for the stream-cons, stream-null, and stream constructors, plus (for convenience) a new stream-cons* constructor, as follows:

(define-pattern-syntax stream-cons
  (syntax-rules ()
    ((_ car-pat cdr-pat)
     (and (? stream-pair?)
          (apply stream-car car-pat)
          (apply stream-cdr cdr-pat)))))

(define-pattern-syntax stream-null
  (syntax-rules ()
    ((_) (? stream-null?))))

(define-syntax stream-cons*
  (syntax-rules ()
    ((_ elt) elt)
    ((_ elt more ...)
     (stream-cons elt (stream-cons* more ...)))))
(define-pattern-syntax stream-cons*
  (syntax-rules ()
    ((_ pat) pat)
    ((_ pat more ...)
     (stream-cons pat (stream-cons* more ...)))))

(define-pattern-syntax stream
  (syntax-rules ()
    ((_ pat ...)
     (stream-cons* pat ... (stream-null)))))

A disadvantage is that the deconstructor for the empty stream must be spelled (stream-null), while the constructor is spelled stream-null. This could be fixed by a corresponding change in the syntax of SRFI 41, or simply by making the empty stream object the same as the standard Scheme empty list object and eliminating the stream-null constructor and deconstructor entirely.

(define (len strm)
  (match strm
    ((stream-null) 0)
    ((stream-cons head tail) (+ 1 (len tail)))))

(define-stream (stream-merge lt? . strms)
  (define-stream (merge xs ys)
    (match-values (values xs ys)
      (((stream-null) ys) ys)
      ((xs (stream-null)) xs)
      (((stream-cons x xs*) (stream-cons y ys*))
       (if (lt? y x)
           (stream-cons y (merge xs ys*))
           (stream-cons x (merge xs* ys))))))
  (stream-let loop ((strms strms))
    (match strms
      ('() stream-null)
      ((list strm) strm)
      ((cons strm strms)
       (merge strm (loop strms))))))

Because the ellipsis form of the seq* pattern is greedy, it cannot safely be used with streams or other potentially infinite data structures without risking divergence. If this is not a concern, the following alternative definition of the stream-cons* pattern syntax may also be used.

(define-pattern-syntax stream-cons*
  (syntax-rules ()
    ((_ pat) pat)
    ((_ pat ... tail-pat)
     (seq* strm ((curr strm (stream-cdr strm)))
           (not (stream-pair? strm))
           curr
       (apply stream-car pat) ... tail-pat))))

What cannot be safely accommodated with this pattern matcher is situations with match-values or match-lambda where one input value holds information which, if examined earlier than some other input value, might prevent the erroneous or divergent forcing of some part of the other input value. The above example of match-values in the merge procedure is safe because it must examine both values anyway, but in general these forms must be used with particular care in combination with lazy data.

Views

The following example of a convenience form for defining new pattern syntax is based on a design originally by Richard Cobbe and found in the test suite of Racket’s pattern matcher. It provides a concise notation for a ? test pattern over a series of apply patterns.

(define-syntax define-view
  (lambda (stx)
    (syntax-case stx ()
      ((_ name test (selector ...))
       (with-syntax (((selector-id ...)
                      (generate-temporaries #'(selector ...))))
         #'(begin
             (define-syntax name (syntax-rules ()))
             (define-pattern-syntax name
               (syntax-rules ()
                 ((_ selector-id ...)
                  (? test
                     (apply selector selector-id) ...))))))))))

The following examples are adapted from those of Wadler (1987b).

;; 2 Viewing an integer as zero or a successor

(define (sub1 n) (- n 1))

(define-view zero zero? ())
(define-view succ integer? (sub1))

(define power
  (match-lambda
    ((x (zero)) 1)
    ((x (succ n)) (* x (power x n)))))

(define fib
  (match-lambda
    (((zero)) 0)
    (((succ (zero))) 1)
    (((succ (succ n))) (+ (fib n) (fib (+ n 1))))))

;; 3 Another view of integers

(define (div2 n) (/ n 2))

(define-view even even? (div2))
(define-view odd odd? ((compose sub1 div2)))

(define power
  (match-lambda
    ((x (zero)) 1)
    ((x (even n)) (power (* x x) n))
    ((x (odd n)) (* x (power (* x x) n)))))

;; 4 Viewing a complex number in cartesian and polar coordinates

(define-view cart complex? (real-part imag-part))
(define-view pole complex? (magnitude angle))

(define add
  (match-lambda
    (((cart x y) (cart x* y*))
     (make-rectangular (+ x x*) (+ y y*)))))
(define mult
  (match-lambda
    (((pole r θ) (pole r* θ*))
     (make-polar (* r r*) (+ θ θ*)))))

Tree patterns

This pattern matcher also does not include by default any equivalent of the *** tree pattern or ‘two-dimensional ellipsis’ which originated as one of Alex Shinn’s extensions to Wright’s pattern syntax. The following example shows how a custom pattern can provide this functionality. Note that this specific matcher is exactly as limited as the original *** matcher, i.e. it can search only trees based on proper lists and will never search for the pattern on its right hand side within the first element of each sublist.

(define-syntax tree (syntax-rules ()))
(define *not-found* (cons '*not-found* '()))
(define (not-found) (values *not-found* #f))
(define (found? x) (not (eq? x *not-found*)))

(define-pattern-syntax tree
  (syntax-rules ()
    ((_ path-pat item-pat)
     (apply (lambda (initial-val)
              (let match-tree ((path '())
                               (val initial-val))
                (match val
                  (item-pat
                   (let ((path (reverse path)))
                     (match path
                       (path-pat
                        (values path val))
                       (_ (not-found)))))
                  ((cons name subvals)
                   (let loop ((more subvals))
                     (match more
                       ((cons subval more)
                        (call-with-values
                            (lambda ()
                              (match-tree (cons name path) subval))
                          (lambda (a b)
                            (if (found? a)
                                (values a b)
                                (loop more)))))
                       ('() (not-found)))))
                  (_ (not-found)))))
            (and (? found?) path-pat)
            item-pat))))

Potential uses of such a pattern include as an alternative to Oleg Kiselyov’s SXPath in some cases:

(match '(a (b c) (b d) (c c)) ((tree x 'd) x)) ; equivalent to XPath
                                               ; //node()[.='d']
;=> (a b)

(match '(a (b c) (b d) (c c)) ((tree (and x (list _ ... 'c)) 'c) x))
                                        ; equivalent to XPath
                                        ; //c[.='c']
;=> (a c)

Note, though, that this pattern has poor performance characteristics, similar to a naïve string search algorithm. (It is no worse than Shinn’s implementation in this regard, however, which uses the exact same algorithm.) A small (non-asymptotic) performance gain for certain subpatterns would be to use not to suppress variable bindings the first time the matcher tries to see if a tree element matches the target pattern. A pattern matcher with ideal performance on this kind of pattern would use a completely new approach incompatible with the design decisions made here to optimize for ‘one-dimensional’ patterns.

Design notes

Constructor-based syntax and the means of extension

The question of whether to use constructor-style or datum-style syntax for deconstruction arose during the pre-R4RS discussion of pattern matching. Chris Haynes presented an early version of his own pattern matching library, which used datum syntax for pairs and lists, but constructor syntax for vectors. Jonathan Rees pointed out the inconsistency and proposed a resolution, noting that a consistent constructor-style syntax would allow the pattern matching syntax to be extended cleanly. He proposed deconstructors as an extra procedure property on constructors, similar to how the T setter functionality worked (later specified by SRFI 17).

In a subsequent thread covering several questions inherent in the design of the would-be R4RS pattern matching library, Will Clinger noted the problems of both approaches. Namely, datum-style deconstructor syntax is, in most cases (including in Wright-style syntax), not an exact mirror of Scheme’s lexical syntax when matching of literal symbols is needed, because patterns such as ('x 1 y) do not correspond to the datum syntax of an actual matching list such as (x 1 "why"); it would, in any case, be a new sublanguage. (Syntax-rules, of course, later resolved this with its separate list of literal identifiers.) The constructor syntax, meanwhile, has a closer resemblance between pattern matching input (under evaluation, rather than under read) and individual patterns, but seemed to pose problems in interaction with the scoping of identifiers in Scheme code outside of match clauses: what, for example, is the meaning of list when used as a deconstructor within a block such as (let ((list vector)) ...)?

Under Rees’s proposal for dynamic extensibility, using special properties on procedure objects, the answer is in fact quite clear: in that example, list will refer lexically to a variable which, at run time, contains the same procedure – including the same deconstructor – as the ordinary Scheme vector procedure: therefore that procedure will both (in regular Scheme code) construct, and (on the left-hand side of match clauses) deconstruct vectors, rather than lists. This seems intuitive. A similar approach, using properties of OO classes, is the means of extension for the match statement in Python 3.10 and above; also, at time of writing, a variant of this idea using properties of JavaScript functions is part of the ECMAScript TC39 pattern matching proposal.

On the other hand, research into efficient compilation of pattern matching has focussed on cases where the semantics of patterns are completely knowable during ahead-of-time compilation, which is not always the case when the semantics depend on run-time procedure calls. If the ECMAScript proposal is adopted, it is likely that it will lead to more research into techniques for efficient compilation of patterns when this method of extension is used, perhaps using JIT-like adaptive optimization techniques, but this has not happened yet. Moreover, Scheme macros, which are designed to support full ahead-of-time compilation, may not be an appropriate tool for implementing this kind of optimization. Lastly, it is not clear how syntactic operators such as quote can be used as deconstructors orthogonally to how procedures are used. Using macros as the means of pattern extension makes it possible to use optimization techniques that are already well understood, to implement pattern matching in an ordinary Scheme macro, and (potentially) to use any Scheme identifier to signify a deconstructor regardless of the type of its binding in ordinary code; but it also implies the problem of conflicting namespaces which Clinger was concerned about.

Queinnec’s original proposal for macro-extensible patterns seemingly resolved this issue by establishing a new naming convention for identifiers: his patterns consistently begin with an * character. Thus, the consistency and extensibility advantages of ‘constructor-style’ patterns can be retained, but the use of different identifiers for deconstructors means deconstruction of values does not exactly mirror construction. (This might be considered a pro or con, depending on your view of whether it’s a good thing for identifiers to have context-sensitive semantics.) SRFI 257 revives this idea, placing transformers for deconstructors within Scheme’s single namespace, with a convention of prefixing them with ~. Under this proposal, list and vector, as ordinary Scheme variables and not pattern transformers, are simply not allowed as the names of deconstructors on the left-hand side of a pattern matching clause. The quote family of operators is an exception to this naming convention, presumably because of the ergonomic advantage of being able to use the lexical abbreviations for them.

Identifier properties, used by this SRFI, offer another approach. In this view, the context-specific semantics of a constructor identifier have to do with its statically knowable binding as an identifier, and not with its actual value as a variable which can only be known at run time. The answer to Clinger’s question is clear: in (let ((list vector)) ...), the binding of list has been shadowed; because pattern syntax belongs to bindings and not to values, new pattern syntax would have to be defined for the list identifier within the let block, otherwise an attempt to use list as the name of a deconstructor within that block would be a syntax violation. This mirrors the semantics of auxiliary syntax keywords in ordinary, non-extensible Scheme macros; the set of auxiliary syntax keywords within match clauses is simply arbitrarily extensible.

Identifier properties, in combination with macros as the means of extensibility, thus offer the benefits of deconstructor syntax matching constructor syntax, an extensible set of deconstructors, and efficient ahead-of-time compilation within an implementation as an ordinary Scheme macro requiring no special support from the underlying compiler. While in perverse cases such as Clinger’s example it is possible that the results of shadowing a deconstructor’s name may be seen as unintuitive, the semantics in fact match those already familiar to Scheme programmers from the use of (auxiliary) syntax keywords in general; in the majority of cases of shadowing, both the Rees-style dynamic extensibility and identifier property-based extensibility behave identically, respecting Scheme’s lexical scoping and its single namespace for all identifiers in all contexts.

One final approach should be considered, which is that taken by Racket’s match library. In this view, ‘match expanders’ are ordinary syntax keywords within Scheme’s single namespace, defined using a special variant of define-syntax. By default, the keyword for a custom match expander may only be used within the left-hand side of a matching clause, but an optional second transformer may be given which works as a normal Scheme macro transformer when the identifier is used outside of a pattern. This allows the same identifier to be used in both patterns and in normal Scheme code, but has the disadvantage that an identifier can only be used in these two specific contexts and can’t be extended by additional properties which might be used by other embedded sublanguages; whereas identifier properties intrinsically allow any number of macros to define and use their own extra information about any identifier.

Non-linear patterns

Non-linear patterns were first added to a Wright-style pattern matcher in Bruce Hauman’s PLT Scheme implementation, then copied by Alex Shinn and Sam Tobin-Hochstadt into their respective implementations. After thorough consideration, I have decided that support for such patterns, while perhaps occasionally useful, is a misfeature in the context of this kind of pattern matcher, because the composition behaviour of non-linear uses of variables with some other features offered by this pattern matcher is poor.

Most of the problems ultimately arise, in essence, from the root issue of deciding which instance of the variable in the pattern should be considered to bind the single comparand against which other values which (potentially) match the same pattern variable can be tested. The intuitive choice is probably the leftmost instance, but this may in the general case put unenforceable and equally opaque restrictions on the expansions of pattern syntax – another feature I find considerably more valuable than general support for non-linear patterns. In other words, pattern syntax such as (foo a b) may wish to expand into a form such as (and (baz b) (bar a)) for performance reasons (note the inversion of variables in the expansion compared to the original form), but this expansion would cause the leftmost instances of a non-linear variable within b to be considered the leftmost overall, rather than the leftmost instances within a and therefore within the original source.

While this difference may appear to be abstract in the sense that all bound values are ‘the same’ anyway, concrete issues arise in composition with, for example, not patterns. As of writing, the Racket documentation attempts to explain the behaviour by saying that ‘instances of an id [i.e. a variable] in different or and not sub-patterns are independent’, but this is not entirely correct and depends on whether the id has appeared outside of a not or or pattern ‘before’ its appearance within one. A pattern such as (list a (not a) (not a)) will match a three-item list and require that the second two elements are distinct from the first (but may be the same as one another); a pattern such as (cons a (not a)) will match a pair whose car and cdr are distinct. This, at least, seems intuitively correct. On the other hand, though, a pattern such as (cons (not a) a) will never match, because the first a (of (not a)) is considered to be independent from the second, meaning that (not a) is effectively (not _), causing all matches to fail. To reiterate, if the (cons a b) were for some reason expanded to (and (? pair?) (apply cdr b) (apply car a)) (note again the inversion of a and b), the inverse would apply, and (cons (not a) a) would function as probably expected. The interaction of pattern syntax with non-linear variables is therefore dependent on implementation details of each form of pattern syntax. This is not transparent to the user of pattern syntax.

An informal rule imposed on pattern syntax definitions that subpatterns should appear in the same order in the expansion as in the input form may seem sufficient to cure this problem, if one is willing to accept that the rule cannot actually be enforced by the pattern expander in any way. Another, more robust solution might be to require occurrences of variables within or and not clauses to be truly independent of one another and of variables outside of or and not, so that (cons a (not a)) would not match any value, or would simply be disallowed; the pattern would have to be expressed e.g. (and (cons a b) (not (cons x x))).

It is also possible that choosing a different set of primitives to those used in this pattern matcher could help alleviate this issue. This should be investigated further.

There is another problem, though: non-linear uses of variables in patterns also compose poorly with ellipsized sequence subpatterns. Here the problem of choosing which instance to use to bind the primary comparand compounds with the problem that a variable appearing within an ellipsized subpattern has (at least) two semantics. Within the ellipsized subpattern, the variable conceptually refers to one value within a sequence of related values; within the body of a matching clause, it refers to a list of all values in that sequence.5 For the purposes of choosing a comparand for an instance of the variable within the larger pattern but outside of the ellipsized subpattern, both Racket’s and Shinn’s match choose the latter interpretation, but differ in which occurrence of the variable is used to find the value used as a comparand. Shinn’s implementation appears to consistently pick the leftmost occurrence, meaning that patterns such as (the equivalent of) (list x x ...) match a sequence of identical items of which there must be at least one, and x is bound to the first item in the list. Racket, on the other hand, picks the ellipsized instance no matter where it appears in the list, so that (list x x ...) matches lists of the form ((a ...) a ...) where there are the same number of equal as in both the sublist and the outer list. The behaviour chosen by Shinn may initially appear more intuitive here, until the patterns are reversed: with a pattern equivalent to (list x ... x), it will require a list of the form (a ... (a ...)) (with the same conditions on the equality and number of as). Racket’s version’s behaviour on this pattern is the same as on Shinn’s, but it issues the warning ‘non-linear pattern used in match’.

Note: I have not tested the behaviour of both pattern matchers extensively to determine if the rules listed above, based on observation of a relatively limited number of test cases, hold for all non-linear patterns in both versions.

The right thing may be for all non-linear occurrences of an ellipsized pattern variable to have the opposite meaning as in Racket: they should refer to one item in the sequence which matched, rather than the entire sequence. Even this does not fully solve the problem, if for no other reason than that it makes it non-trivial to elegantly express a pattern such as (list (complex-subpattern x) ...) with the additional constraint that all the xs must be the same. Further, the performance impact of non-linear patterns in combination with the repetition operators in sequence subpatterns may be significant. The full semantic and performance implications of such a change should be investigated first.

In summary, while it may be possible to find solutions to the problems of composability of non-linear patterns with other features of the pattern matcher, the solutions would necessarily involve significant innovations compared to the semantics of non-linear patterns in existing Scheme pattern matchers. Finding the ‘correct’ solutions to these issues, and verifying their correctness, is outside of the scope of the current version of this pattern matcher. This version therefore simply does not allow multiple occurrences of the same variable within a pattern, as with syntax-rules and syntax-case.

Those interested in more discussion of the semantic issues of non-linear patterns may find this mailing list thread about the reasons Haskell does not support them relevant.

Application patterns and multiple values

The apply pattern syntax does not offer direct support for checking the number of return values from the procedure it calls: it is undefined behaviour if the number of values returned is not the same as the number of subpatterns. Ideally, the behaviour would be well-defined in this case, namely that the apply pattern should not match if the procedure returns the wrong number of values. This would make it easier to use varying numbers of return values as a cheap kind of sum type, especially as an option type.

Unfortunately, this would result in a small ‘peanut butter’-type performance overhead for the vast majority of uses of apply patterns with procedures which always return the same number of values: all of these uses would need to compile to include an explicit check on the number of return values, not only the uses where alternative clauses check for a different number of values. Scheme implementations are not very good at optimizing away such checks (primitively expressed as call-with-values over case-lambda): as of writing, only Chez Scheme version 10 is known to be able to do this optimization, and only for procedures of one return value. It may also make it harder to keep compile times for optimizing implementations reasonable, since it would increase the number of primitive pattern types which are directly refutable on their own terms (and not only refutable on the basis of some subpattern).

The workaround is to use a procedure argument to the apply pattern which converts multiple values into a single container (such as a multiple-value box from SRFI 195) and write a pattern checking its contents.

If Scheme implementations’ ability to optimize this case for procedures of known co-arity improves significantly, then a future revision of this SRFI may change the semantics of apply patterns to the correct ones.

Disjointed variables

Disjointed variable is the name I have given to any variable which occurs in some subpattern(s) of an or pattern (also formally called a disjunctive pattern, whence the name), but not in all of the subpatterns.

Here also, the behaviours of the existing pattern matchers differ. Wright’s original implementation and the Racket pattern matcher signal an error during expansion upon encountering or patterns containing any disjointed variables. Shinn’s implementation binds the disjointed variables to an unspecified value when the matching subpattern does not contain them. Hauman’s implementation causes an undefined identifier error to be signalled when a disjointed variable’s value is accessed and the matching subpattern does not contain it. Nelson’s implementation behaves quite strangely and I cannot easily tell what it is doing.

This pattern matcher takes a slightly new approach: it is not a syntax error to merely use an or pattern with disjointed variables, only to subsequently attempt to (statically) refer to a disjointed variable in the code for the matching clause. In practice, this is almost indistinguishable from the behaviour of Wright’s original and the Racket pattern matchers, but with one difference which is important in the context of macro-extensible pattern matching: it allows non-hygienic patterns to be used in the definition of hygienic patterns without ruining the ability of the hygienic patterns to be used as subpatterns of an or pattern.

Consider, for example, a pattern regexp which tests an input string against a regular expression and, as a convenience, unhygienically binds pattern variables $1, $2, $3 etc. to the substrings which matched subgroups in the regular expression. A higher-level pattern correct-looking-string may wish to build on regexp to create a pattern matching strings having a certain form, but its use of regexp is purely an implementation detail that should not be exposed. Fortunately, syntax object-based expanders already handle this case correctly, and the $1, $2, $3 variables will be created with marks corresponding to the use of regexp within correct-looking-string, not the use of correct-looking-string directly. However, with the Racket/Wright semantics, any attempt to use correct-looking-string within only one clause of an or pattern will cause an immediate syntax error because of the $1, $2, $3 variables which, due to their marks, are not even accessible within the matching clause code. Binding these disjointed variables instead to syntax transformers which always signal a syntax violation addresses the real problem with disjointed variables with no run time overhead.

An alternative would be an approach more similar to Hauman’s, also comparable to the approach used by the letrec and letrec* forms in case of violations of their respective restrictions on variable value access and assignment. In this approach, disjointed variables would instead be bound to syntax transformers which expand into code signalling a run-time error if the matching subpattern did not bind the corresponding variable, and returning the variable’s value otherwise. I think this would only encourage bad programming styles, though – in particular, where checks already performed by the matcher are repeated within the matching code in order to determine if some variable can be safely accessed – and people should instead use different matching clauses.

Coverage checking

The pattern matcher includes no facilities for ensuring exhaustiveness – that is, statically (at expand time) checking that a set of patterns covers all possible inputs in a particular context. Given Scheme’s dynamic typing, full checking of exhaustiveness is not possible in any case. I considered how to provide such checking for more limited cases, such as providing forms that would ensure that all record types within a given family of related types are handled. Eventually, I rejected this as out of scope for this library. Tobin-Hochstadt (2011) also contains a brief discussion of the problems of exhaustiveness checking in a dynamically typed pattern matcher.

Users interested in pattern matching within a closed world of potential input values may be interested to look at the Nanopass compiler framework (Sarkar et al. 2005, Sarkar 2008, Keep 2012) for guidance and inspiration. Of course, the extensible pattern matcher presented here may be a good back-end target for a pattern matching library with this feature, since it already implements the optimizations needed for good run-time performance. A pattern matcher with coverage checking may also be able to make good use of custom pattern syntax within its implementation.

Implementation

The sample implementation is provided by the author’s extensible-match library. Version (- 1 (expt 2 (- n))) is the sample implementation corresponding to draft n of the SRFI; when the SRFI is finalized, version 1 will be permanently archived in the SRFI repository. The implementation should run on any R6RS implementation with SRFIs 1 (list library), 133 (vectors library), 151 (bitwise operations), 213 (identifier properties), and 244 (define-values), with the additional caveat (as mentioned under the entry for define-pattern-syntax) that syntax-rules expressions must return an ordinary transformer procedure and not any implementation-specific kind of transformer.6 At time of writing, this amounts only to Chez Scheme with Aaron Hsu’s chez-srfi package, but I hope to submit a patch to Guile for identifier property support soon, and Andrew Whatson’s work on porting Unsyntax (which supports SRFI 213, but does not support much of R6RS nor the other needed SRFIs) to other Scheme implementations may also make it available there soon.

The sample implementation is designed to be well-optimizing. On implementations which support the guaranteed-optimization clause of the macro-writer’s bill of rights (Dybvig 2004), the code it produces for any given match expression should generally be as efficient as the equivalent hand-written conditionals and bindings.

I would be interested in an alternative, mildly optimizing implementation in terms of SRFI 148 which made the means of pattern syntax transformer lookup and application pluggable. This would allow Scheme implementations which do not natively provide syntax-case and whose maintainers don’t wish to provide full identifier property support to take advantage of macro-extensible pattern matching.

Hints for implementers

This section of the SRFI gives hints that might be useful to anyone coming up with a re-implementation of this pattern matcher.

The implementation of pattern expansion is quite subtle. Each individual expansion of pattern syntax must obey the hygiene condition. In order to achieve this portably, the sample implementation actually returns to the Scheme expander after each step of its expansion, so the expander can apply its marks/colours/timestamps as needed. An implementation targeting a specific Scheme implementation may be able to do better if it can access the internals of the expander to actually do identifier renaming itself, as in e.g. Racket’s syntax-local-apply-transformer. As Ballantyne et al. (2020) point out, a system of primitives such as this would be useful for extensible embedded languages in general, so a future SRFI should provide a way for a transformer to re-enter the expander on part of its input without using CPS-style expansions. A mechanism such as Racket’s make-syntax-introducer7 could be a useful primitive for this, which would also provide an elegant solution to the problems addressed by SRFIs 258 and 260.

The test suite for the sample implementation tests publically exposed behaviour only. It is thus suitable for testing that your implementation is SRFI-conformant – although the SRFI specification text is normative, and errors and omissions in the test suite are possible. The ordering of tests in the test suite is such that fixing earlier tests is likely to fix numerous later tests which implicitly depend on the same behaviour as well. See also the note on the test for match-letrec under that entry.

For implementing seq and seq* patterns, in the general case, you will need to implement a finite automaton simulation, either using backtracking, an NFA simulation keeping track of multiple states, or a DFA transformation. For simple seq patterns you can avoid this. A performant implementation should probably have multiple strategies on hand depending on the structure of the seq subpatterns.

There are a number of possible strategies for implementing seq/unordered patterns. The sample implementation uses a backtracking approach, which offers near-linear performance for the majority of patterns. Another option is Brzozowski derivatives,8 but a naïve implementation will require exponentially growing time and space for growing numbers of irrefutable subpatterns, or subpatterns which are ambiguous with respect to one another. One of the tests in the test suite is deliberately designed to take a long time (albeit without failing) on implementations which fail to optimize for the case of a large number of irrefutable subpatterns. Unfortunately, this test case is inherently racing against Moore’s law, so if you are using the test suite 20 years from now, it’s possible it will no longer appear quite so slow even if your implementation has bad asymptotic performance.

It is also moderately tricky to statically enforce the ban on non-linear patterns (which implementations should do, unless they have solved the problems mentioned in the design notes and have good semantics for supporting them as a local extension to this specification). An attempt to use a pattern variable non-linearly might be local to one subpattern of or, or appear within a not pattern where binding is suppressed, but the attempt should nonetheless not be allowed.

Acknowledgements

I stand on the shoulders of the designers and implementers of many previous Scheme pattern matching forms. Thanks to all of them. We all collectively stand on the shoulders of the pioneers of pattern matching in Lisp and in other languages, whose contributions are described in the rationale and listed in the bibliography. Thanks in particular to Sam Tobin-Hochstadt, the author of Racket’s match, and Ryan Culpepper, the author of Racket’s syntax-parse, and the rest of the Racket community both for the direct inspiration, and for answering my questions about the history, design, and implementation of pattern matching in Racket.

Thanks also to all those who insisted that R7RS Large should include some kind of pattern matcher, which inspired me to work on a solution which is (in my opinion) superior to the popular matcher of Duba, Wright, and Shinn.

Thanks to Stefan Ljungstrand for discussion of possible features, and to him and also Wolfgang Corcoran-Mathe for working through the issues with non-linear patterns together with me.

Andrew Whatson pointed me to Frederick McBride’s Ph.D. dissertation, and to T’s destructure special form. These were both very useful pointers to help establish the early history of pattern matching in programming languages, and in Lisp in particular.

Marc Nieper-Wißkirchen’s implementation of R7RS syntax-rules in R6RS allowed me to review the post-optimization expansion of Alex Shinn’s match implementation in Chez Scheme, which helped me identify optimizations to make in my own implementation.

Jeronimo Pellegrini found an error in an example.

The term ‘subject’ was adopted based on a suggestion made by Ron Buckton in the context of the ECMAScript TC39 pattern matching proposal.

References

Ballantyne, Michael, King, Alexis, and Felleisen, Matthias (2020) ‘Macros for Domain-Specific Languages’ in Proceedings of the ACM on Programming Languages 4 (OOPSLA): 229

Burstall, R. M. (1969) ‘Proving properties of programs by structural induction’ in The Computer Journal 12 (1): 41–48

Burstall, R. M., MacQueen, D. B., and Sannella, D. T. (1980) ‘Hope: An Experimental Applicative Language’ in LFP ’80: Proceedings of the 1980 ACM Conference on LISP and Functional Programming 136–43

Dybvig, R. Kent (2004) ‘The Guaranteed Optimization Clause of the Macro-Writer’s Bill of Rights’ at Daniel P. Friedman: A Celebration (recorded conference paper, 3–4 December; slides)

Hauman, Bruce (c2004) ‘Pattern Matching in Scheme’ (Internet Archive WayBack Machine, 26 June 2007)

Hudak, Paul, Hughes, John, Peyton Jones, Simon, and Wadler, Philip (2007) ‘A History of Haskell: Being Lazy With Class’ in HOPL III: Proceedings of the Third ACM SIGPLAN Conference on History of Programming Languages 12-1 – 12-55 (post-print on the website of Simon Peyton Jones)

Keep, Andrew W. (2012) ‘A Nanopass Framework For Commercial Compiler Development’ (Ph.D. dissertation, Indiana University)

Landin, P. J. (1966) ‘The Next 700 Programming Languages’ in Communications of the ACM 9 (3): 157–66

McBride, Frederick Valentine (1970) ‘Computer Aided Manipulation of Symbols’ (Ph.D. dissertation, Queen’s University of Belfast)

McCarthy, John and Painter, James (1967) ‘Correctness of a compiler for arithmetic expressions’ in Proceedings of Symposia in Applied Mathematics 19 (Mathematical Aspects of Computer Science): 33–41 (post-print on the website of the author)

MacQueen, David, Harper, Robert, and Reppy, John (2020) ‘The History of Standard ML’ in Proceedings of the ACM on Programming Languages (History of Programming Languages 4)

Moses, Joel (1967) ‘Symbolic Integration’ (Ph.D. dissertation, Massachusetts Institute of Technology)

Monnier, Stefan and Sperber, Michael (2020) ‘Evolution of Emacs Lisp’ in Proceedings of the ACM on Programming Languages (History of Programming Languages 4)

Petrofsky, Al (2001) ‘How to write seemingly unhygienic macros using syntax-rules’ in comp.lang.scheme (Usenet posting, 19 November)

Pitman, Kent M. and Moon, David (1989) ‘DESTRUCTURING-BIND’ (memorandum to ANSI committee X3J13)

Queinnec, Christian (1990a) ‘Compilation of non-linear, second order patterns on S-expressions’ in Lecture Notes in Computer Science 456 (PILIP 90): 340–57 (post-print on the website of the author)

Queinnec, Christian (1990b) Le Filtrage : une application de (et pour) Lisp. Paris: InterÉditions (ebook on the website of the author)

Sarkar, Dipanwita (2008) ‘Nanopass Compiler Instrastructure’ (Ph.D. dissertation, Indiana University)

Sarkar, Dipanwita, Waddell, Oscar, and Dybvig, R. Kent (2005) ‘Educational Pearl: A Nanopass Framework for Compiler Education’ in Journal of Functional Programming 15 (5): 653–67

Shinn, Alex (2006) ‘portable hygienic pattern matching’ in comp.lang.scheme (Usenet posting, 29 November)

Steele, Guy L., Jr., White, Jon L., Cannon, Howard I. and Kerns, Bob (1974–82) ‘LISP.NEWS’ (computer file)

Steele, Guy L., Jr. (1984) Common LISP: The Language (1st edition) Bedford, Mass., USA: Digital Press

Tobin-Hochstadt, Sam (2011) ‘Extensible Pattern Matching in an Extensible Language’ (pre-print)

Wadler, Philip (1987a) ‘A critique of Abelson and Sussman –or– Why calculating is better than scheming’ in ACM SIGPLAN Notices 22 (3): 83–94 (post-print on the website of the author)

Wadler, Philip (1987b) ‘Views: A way for pattern matching to cohabit with data abstraction’ in POPL ’87: Proceedings of the 14th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages 307–18 (revised version on the website of the author)

Wright, Andrew K. (1994) ‘Pattern Matching Syntactic Extensions for Scheme’ version 1.07 for Chez Scheme (software release)

Wright, Andrew K. (1996) ‘Pattern Matching for Scheme’ (memorandum, Rice University)

Wright, Andrew K. and Cartwright, Robert (1997) ‘A Practical Soft Type System for Scheme’ in ACM Transactions on Programming Languages and Systems 19 (1): 87–152


  1. It is included in the sys/let.t source for T 2.9 (released 1984); the file has a copyright date of 1983–4. ↩︎

  2. In Tobin-Hochstadt (2011), Matthias Felleisen claims to have been the original author of this system, but there is no historical evidence to support his claim, and some circumstantial evidence against it. ↩︎

  3. The name ‘Wright–Cartwright–Shinn’ originated in SRFI 200, adding the name of Robert Cartwright those of the two people usually credited with the most widespread implementations of this pattern matcher. Cartwright did co-author a paper with Wright (Wright and Cartwright 1997) which contained an elementary description of the pattern matcher as a prelude to explaining the use of a subset of its syntax within a type-inferred implementation of Scheme. There is no evidence that Cartwright was involved in the design or implementation of the widely distributed original version of the matcher (running on standard implementations of Scheme): he was simply a co-author of another paper with Wright on the subject of a program which incorporated a version of it. It should have been called the ‘Duba–Wright–Shinn’ pattern matcher, after the three people whose contributions to the popular implementations of it are verifiable. ↩︎

  4. This version still includes a compatibility layer to support Wright-style patterns, and thus it could be considered yet another independent re-implementation of Wright’s matcher. ↩︎

  5. If one ellipsized pattern appears within another ellipsized pattern, it could also refer to a subsequence at any level of nesting, but this possibility will not be considered here. ↩︎

  6. Due to R6RS’s lack of a bound-identifier-hash procedure, for optimum expansion times with the sample implementation, the generate-temporaries procedure should always return identifiers with distinct symbolic names, so that (symbol-hash (syntax->datum id)) is a reasonably effective hash function for each id in a set of generated identifiers. ↩︎

  7. See also the documentation for the previous, mark-based implementation↩︎

  8. The use of Brzozowski derivatives to match disordered sequences was pioneered by Mark Hopkins in his regular expression library, and by Joe English and James Clark for SGML/XML validation in the presence of the DTD/RELAX NG interleaving operator. (See ‘How to validate XML’, ‘The Design of RELAX NG’, and ‘Notes on implementing RELAX NG’) However, both DTDs and RELAX NG put restrictions on the use of the interleaving operator which make it easy to write a validator which always runs in linear time (essentially reducing to a simple anagram problem). Since subpatterns in seq/unordered can evaluate arbitrary Scheme code, such a restriction is not possible here. ↩︎


© 2025 Daphne Preston-Kendal

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the ‘Software’), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software.

The software is provided ‘as is’, without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. In no event shall the authors or copyright holders be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the software or the use or other dealings in the software.


Editor: Arthur A. Gleckler