by Sergei Egorov
This SRFI is currently in draft status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-257@nospamsrfi.schemers.org
. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.
Pattern matching extends Scheme's repertoire of conditional constructs, allowing
decomposition of compound data structures and binding their parts to variables.
This SRFI proposes one such construct, match
, which provides all
common-denominator functionality described in SRFI-200,
adding on top of it support for non-linear patterns and backtracking. Also, the
proposed construct is modular, and allows for convenient extension via the
define-match-pattern
mechanism.
Currently, none.
This SRFI proposes a construct with functionality similar to the one of “Wright-style” pattern matcher (see SRFI-204), with the following differences: it supports backtracking, and it is designed in a modular way, allowing user extensions not only in the form of new types of patterns, but also new pattern matching languages, such as the Dybvig-Friedman-Hilsdale matcher (SRFI-241).
This proposal is influenced by the design of the Gerbil matcher
(see here).
It features patterns resembling corresponding constructors, e.g. (vector (cons x y))
,
unlike most other designs, where patterns resemble printed representation of data, e.g. #((x . y))
,
or mix both forms to support pattern combinators such as and
. However, while in most other designs
the set of constructor-like pattern forms is fixed, and their names are hardwired in the design,
the proposed construct makes constructor-like pattern forms “first-class” components that
can be created/imported/exported independently; they are just macros, implementing an
internal protocol.
The downside of this approach is that one can no longer name pattern matchers after regular
constructors as they cannot share the same namespace. To stay within the bounds of what can be
implemented via regular syntax-rules
, the choice was made to use tilde-prefixed names,
so the above pattern becomes (~vector (~cons x y))
. This convention is used for all
forms of patterns except for quote
and quasiquote
, which have convenient
reader abbreviations and thus are built into the matcher.
Traditional patterns, resembling printed representations, are supported “under quasiquote”,
as proposed in (withdrawn) SRFI-200.
New types of patterns as well as new forms of pattern languages can be built via
define-match-pattern
form, which provides a syntax-rules
-like
mechanism for pattern rewriting.
The examples below are taken from SRFI-204 and adapted to demonstrate the same functionality using the proposed pattern syntax in its constructor-like and quasiquoted forms:
Literal patterns:
(let ([ls (list 'a "b" #f 2 '() #\c '#(1))])
(list (match ls [(~list 'a "b" #f 2 '() #\c #(1)) 'ok])
(match ls [`(a "b" #f 2 () #\c #(1)) 'ok])))
⟹ (ok ok)
Pattern variables:
(match (list 1 2 3) [(~list a b c) b]) ⟹ 2
(match (list 1 2 3) [(~list _ b _) b]) ⟹ 2
(match (list 1 2 3) [`(a ,b c) b] [_ 'fail]) ⟹ fail
(match (list 1 2 3) [`(1 ,b ,_) b] [_ 'fail]) ⟹ 2
Nonlinear patterns:
(match (list 'A 'B 'A) [(~list a b a) a] [_ 'fail]) ⟹ A
(match (list 'A 'B 'A) [`(,a b ,a) a] [_ 'fail]) ⟹ fail
(match (list 'A 'B 'A) [`(,a B ,a) a] [_ 'fail]) ⟹ A
(match (list 'A 'B 'A) [`(,a ,b ,a) a] [_ 'fail]) ⟹ A
In this proposal, quasiquoted and regular patterns can be arbitrarily combined:
(let ([x '(1 2 3 4)])
(list (match x [(~cons a (~append b (~list c))) (list a b c)])
(match x [(~cons a `(,@b ,@(~list c))) (list a b c)])
(match x [(~cons a `(,@b ,c)) (list a b c)])
(match x [`(,a ,@b ,c) (list a b c)])))
⟹ ((1 (2 3) 4) (1 (2 3) 4) (1 (2 3) 4) (1 (2 3) 4))
Ellipsis (~etc
) and tail patterns:
(match (list 1 2) [(~list* 1 2 (~etc 3)) #t]) ⟹ #t
(match (list 1 2 3) [(~list* 1 2 (~etc 3)) #t]) ⟹ #t
(match (list 1 2 3 3 3) [(~list* 1 2 (~etc 3)) #t]) ⟹ #t
(match '((a time) (stitch saves) (in nine)) [(~etc (~list x y)) (list x y)])
⟹ ((a stitch in) (time saves nine))
(match '((a b) (c d) (e f)) [(~etc (~list x y)) (list x y)])
⟹ ((a c e) (b d f))
(define (transpose x)
(match x [(~etc (~cons a (~etc b))) (cons a (transpose b))] [_ '()]))
(transpose '((1 2 3) (4 5 6))) ⟹ ((1 4) (2 5) (3 6))
(define (palindrome? str)
(let loop ([chars (filter char-alphabetic?
(string->list (string-foldcase str)))])
(match chars
['() #t]
[(~list a) #t]
[(~cons a (~append (~etc b) (~list a))) (loop b)]
[_ #f])))
(palindrome? "Able was I, ere I saw Elba.") ⟹ #t
(palindrome? "Napoleon") ⟹ #f
(define (first-column x)
(match x [(~etc (~cons a (~etc _))) a]))
(first-column '((1 2 3) (4 5 6) (7 8 9))) ⟹ (1 4 7)
This proposal disagrees with “Wright-style” matcher (WSM) on quasiquote
unquote-splicing
patterns; in this proposal, they are
treated in accordance with the regular quasiquote rules, rather than being turned into ...
repeats:
(match (list 1 2) [`(1 2 ,@3) #t] [_ #f]) ⟹ #f ;
WSM returns#t (match '(1 2 . 3) [`(1 2 ,@3) #t] [_ #f]) ⟹ #t ;
WSM returns#f (match (list 1 2 3 3 3) [`(1 2 ,@3) #t] [_ #f]) ⟹ #f ;
WSM returns#t
WSM-compatible behavior can be imitated by explicit insertion of ~etc
:
(match (list 1 2) [`(1 2 ,@(~etc 3)) #t] [_ #f]) ⟹ #t
(match '(1 2 . 3) [`(1 2 ,@(~etc 3)) #t] [_ #f]) ⟹ #f
(match (list 1 2 3 3 3) [`(1 2 ,@(~etc 3)) #t] [_ #f]) ⟹ #t
Unlike WSM, this proposal supports non-linearity in and out of “ellipsis” patterns:
(match '((1 2 3 4) ((1) (2) (3) (4)) (1 2 3 4))
[(~list a* (~etc (~list a*)) a*) a*])
⟹ (1 2 3 4)
Some WSM postfix operators do not have built-in equivalents, but can be easily defined.
An analog of WSM **1
can be defined as follows:
(define-match-pattern ~etc+ ()
[(~etc+ p) (~pair? (~etc p))])
(match (list 1 2) [(~list* a b (~etc+ c)) c] [_ #f]) ⟹ #f
(match (list 1 2 3) [(~list* a b (~etc+ c)) c] [_ #f]) ⟹ (3)
An analog of WSM =..
k can be defined as follows:
(define-match-pattern ~etc= ()
[(~etc= k p)
(~and (~list? (~prop length => k))
(~etc p))])
(match '((a b) (c d) (e f))
[(~etc= 3 (~list x y)) (list x y)]
[_ 'fail])
⟹ ((a c e) (b d f))
(match '((a b) (c d) (e f) (g h))
[(~etc= 3 (~list x y)) (list x y)]
[_ 'fail])
⟹ fail
An analog of WSM *..
k j can be defined as follows:
(define-match-pattern ~etc** ()
[(~etc** k j p)
(~and (~list? (~prop length => (~and (~test >= (k)) (~test <= (j)))))
(~etc p))])
(match '((a b) (c d) (e f))
[(~etc** 2 4 (~list x y)) (list x y)]
[_ 'fail])
⟹ ((a c e) (b d f))
(match '((a b) (c d) (e f) (g h))
[(~etc** 2 4 (~list x y)) (list x y)]
[_ 'fail])
⟹ ((a c e g) (b d f h))
(match '((a b) (c d) (e f) (g h) (i j))
[(~etc** 2 4 (~list x y)) (list x y)]
[_ 'fail])
⟹ fail
Tail patterns will match dotted pairs, and ellipsis patterns won't:
(define (keys x)
(match x [(~etc (~cons a (~etc _))) a] [_ 'fail]))
(keys '((a 1) (b 2) (c 3))) ⟹ (a b c)
(keys '((a . 1) (b . 2) (c . 3))) ⟹ fail
(define (keys x)
(match x [(~etc (~cons a _)) a] [_ 'fail]))
(keys '((a 1) (b 2) (c 3))) ⟹ (a b c)
(keys '((a . 1) (b . 2) (c . 3))) ⟹ (a b c)
Logical patterns:
(match 1 [(~and) #t]) ⟹ #t
(match 1 [(~and x) x]) ⟹ 1
(match 1 [(~and x 1) x]) ⟹ 1
(match #f [(~and) #t] [_ #f]) ⟹ #t
(match #f [(~and x) (=> fail) (if x #t (fail))] [_ #f]) ⟹ #f
(match 1 [(~or) #t] [_ #f]) ⟹ #f
(match 1 [(~or x) x]) ⟹ 1
(match 1 [(~or x 2) x]) ⟹ 1
(define (last-matches-one-of-first-three x)
(match x [`(,a ,a) #t]
[`(,a ,b ,@c ,(~or a b)) #t]
[`(,a ,b ,c ,@d ,c) #t]
[_ #f]))
(last-matches-one-of-first-three '(1 2 3 4 5 1)) ⟹ #t
(last-matches-one-of-first-three '(1 2 3 4 5 2)) ⟹ #t
(last-matches-one-of-first-three '(1 2 3 4 5 3)) ⟹ #t
(last-matches-one-of-first-three '(1 2 3 4 5 6)) ⟹ #f
(define (last-matches-one-of-first-three2 x)
(match x
[`(,a ,a) #t]
[`(,a ,b ,@c ,d) (=> fail)
(if (or (equal? d a) (equal? d b)) #t (fail))]
[`(,a ,b ,c ,@d ,e) (equal? c e)]
[_ #f]))
(last-matches-one-of-first-three2 '(1 2 3 4 5 1)) ⟹ #t
(last-matches-one-of-first-three2 '(1 2 3 4 5 2)) ⟹ #t
(last-matches-one-of-first-three2 '(1 2 3 4 5 3)) ⟹ #t
(last-matches-one-of-first-three2 '(1 2 3 4 5 6)) ⟹ #f
This proposal explicitly assigns #f
to undefined ~or
variables:
(match '(0 1 2 3 4 5 6 7)
[(~etc (~or 2 6 rest)) rest])
⟹ (0 1 #f 3 4 5 #f 7)
Single-argument ~not
patterns are supported.
It is an error for a pattern to refer to the same variable inside and outside of a ~not
pattern.
(match 1 [(~and x (~not #f)) x] [_ 'fail]) ⟹ 1
(match #f [(~and x (~not #f)) x] [_ 'fail]) ⟹ fail
(match 1 [(~not 2) #t]) ⟹ #t
Predicates and fields are supported via ~?
and ~=
respectively.
It is an error to refer to pattern variables in non-subpattern parts of these patterns.
(match 1 [(~? odd? x) x]) ⟹ 1
(match '(a) [(~= car x) x]) ⟹ a
Tests that use more than one pattern variable should be implemented in the body of the corresponding rule:
(define (fibby? x)
(match x
[(~list* a b c rest)
(if (= (+ a b) c) (fibby? (cons b (cons c rest))) #f)]
[(~list a b) #t]
[(~list a) #t]
['() #t]
[_ #f]))
(fibby? '(4 7 11 18 29 47)) ⟹ #t
This section contains examples specific to new functionality proposed in this SRFI.
Some patterns marked as “iterative” in this proposal may match the input object in multiple ways, or “solutions”, backtracking to the next solution if the current one causes downstream patterns to fail (structurally or because of a disagreement in values of shared pattern variables). This process may be time-consuming, but is associated only with iterative patterns, causing no overhead if they are not used.
Iteration strategies vary between pattern types. Regular versions of append matchers are greedy, giving preference to long leftmost segments:
(define (pr* p . x*)
(for-each (lambda (x) (display x p)) x*))
(let ([p (open-output-string)])
(match "abc"
[(~string-append a (~string b) c)
(=> next) (pr* p "1:" a "+" b "+" c ";") (next)]
[(~string-append a c)
(=> next) (pr* p "2:" a "+" c ";") (next)]
[x (get-output-string p)]))
⟹ "1:ab+c+;2:abc+;"
Non-greedy appends give preference to short leftmost segments:
(let ([p (open-output-string)])
(match "abc"
[(~string-append/ng a (~string b) c)
(=> next) (pr* p "1:" a "+" b "+" c ";") (next)]
[(~string-append/ng a c)
(=> next) (pr* p "2:" a "+" c ";") (next)]
[x (get-output-string p)]))
⟹ "1:+a+bc;2:+abc;"
Backtracking can be triggered explicitly from the body; calling “back” guard formal retries starting from the current pattern's most recent backtracking point or next rule if none left:
(let ([p (open-output-string)])
(match "abc"
[(~string-append a (~string b) c)
(=> next back) (pr* p "1:" a "+" b "+" c ";") (back)]
[(~string-append a c)
(=> next back) (pr* p "2:" a "+" c ";") (back)]
[x (get-output-string p)]))
⟹ "1:ab+c+;1:a+b+c;1:+a+bc;2:abc+;2:ab+c;2:a+bc;2:+abc;"
It is relatively easy to define custom match forms with a different pattern grammar
using a combination of “converter” patterns defined via define-match-pattern
and regular syntax-rule
-based macro for the form itself. This process is
facilitated by a collection of special-purpose patterns, as shown below.
Defining an analog of Dybvig-Friedman-Hilsdale matcher with support for catamorphism feature:
(define-match-pattern ~cmp->p (<...> <_> unquote ->)
[(_ l ,(x)) (~prop l => x)]
[(_ l ,(f -> x ...)) (~prop f => x ...)]
[(_ l ,(x ...)) (~prop l => x ...)]
[(_ l ,<_>) _]
[(_ l ,x) x]
[(_ l ()) '()]
[(_ l (x <...>)) (~etc (~cmp->p l x))] ; optimization
[(_ l (x <...> . y)) (~append/t y (~etc (~cmp->p l x)) (~cmp->p l y))]
[(_ l (x . y)) (~cons (~cmp->p l x) (~cmp->p l y))]
[(_ l #(x ...)) (~list->vector (~cmp->p l (x ...)))]
[(_ l other) 'other])
(define-syntax cm-match
(syntax-rules (guard)
[(_ () l x () (c ...))
(match x c ... [_ (error "cm-match error" x)])]
[(_ () l x ([cmp (guard . e*) . b] cmc ...) (c ...))
(cm-match () l x (cmc ...)
(c ... [(~replace-specials <...> <_> (~cmp->p l cmp))
(=> n) (if (and . e*) (let () . b) (n))]))]
[(_ () l x ([cmp . b] cmc ...) (c ...))
(cm-match () l x (cmc ...)
(c ... [(~replace-specials <...> <_> (~cmp->p l cmp))
(let () . b)]))]
[(_ x cmc ...)
(let l ([v x]) (cm-match () l v (cmc ...) ()))]))
Note the use of ~replace-specials
pattern that substitutes
<...> <_>
for special ... _
identifiers
to allow them to have their regular meaning inside define-match-pattern
.
It also uses the ~append/t
pattern, a non-iterative list append
that takes the length of the tail segment from its first argument.
Defined in this manner, cm-match
can be used as D-F-H match
w.r.t. matching functionality (no enhanced right-hand-side quasiquote
):
(let ([simple-eval
(lambda (x)
(cm-match x
[,i (guard (integer? i)) i]
[(+ ,[x*] ...) (apply + x*)]
[(* ,[x*] ...) (apply * x*)]
[(- ,[x] ,[y]) (- x y)]
[(/ ,[x] ,[y]) (/ x y)]
[,x (error "invalid expression" x)]))])
(simple-eval '(+ (- 0 1) (+ 2 3))))
⟹
4
(let ([split
(lambda (lis)
(cm-match lis
[() (values '() '())]
[(,x) (values `(,x) '())]
[(,x ,y . ,[odds evens])
(values `(,x . ,odds)
`(,y . ,evens))]))])
(split '(a b c d e f)))
⟹
(a c e)
(b d f)
Defining a syntax-rules
-like matcher with explicit list of literal identifiers:
(define-match-pattern ~srp->p (<...> <_>)
[(_ l* ()) '()]
[(_ l* (x <...>)) (~etc (~srp->p l* x))] ; optimization
[(_ l* (x <...> . y)) (~append/t y (~etc (~srp->p l* x)) (~srp->p l* y))]
[(_ l* (x . y)) (~cons (~srp->p l* x) (~srp->p l* y))]
[(_ l* #(x ...)) (~list->vector (~srp->p l* (x ...)))]
[(_ l* <_>) _]
[(_ l* a) (~if-id-member a l* 'a a)])
(define-syntax sr-match
(syntax-rules (=>)
[(_ () l* v () (c ...))
(match v c ... [_ (error "sr-match error" v)])]
[(_ () l* v ([srp . b] src ...) (c ...))
(sr-match () l* v (src ...) (c ...
[(~replace-specials <...> <_> (~srp->p l* srp)) . b]))]
[(_ x (l ...) src ...)
(let ([v x]) (sr-match () (l ...) v (src ...) ()))]))
Note the use of ~if-id-member
pattern to check if an atom is a literal.
(sr-match '(begin (a 5) (b 6) (c 7) (d 8))
(begin)
[(begin (x* y*) ...) (list x* y*)])
⟹ ((a b c d) (5 6 7 8))
(sr-match '((a b c d) (e f g) (h i) (j))
()
[((x* y** ...) ...) (list x* y**)])
⟹ ((a e h j) ((b c d) (f g) (i) ()))
The grammar rules for the match
form are:
⟨match expression ⟩ ⟶
(match
⟨expression ⟩ ⟨match rule ⟩*)
⟨match rule ⟩ ⟶
(
⟨match pattern ⟩ (=>
⟨next id ⟩ ⟨back id ⟩)
⟨body ⟩ )
|
(
⟨match pattern ⟩ (=>
⟨next id ⟩)
⟨body ⟩ )
|
(
⟨match pattern ⟩ ⟨body ⟩ )
⟨match pattern ⟩ ⟶
⟨underscore ⟩
|
⟨match pattern identifier ⟩
|
⟨atomic datum ⟩
|
(quote
⟨datum ⟩)
|
(quasiquote
⟨match quasipattern ⟩)
|
(
⟨match pattern name ⟩ ⟨match pattern argument ⟩*)
⟨match quasipattern ⟩ ⟶
()
|
⟨atomic datum ⟩
|
⟨match quasipattern symbol ⟩
|
(unquote
⟨match pattern ⟩)
|
((unquote-splicing
⟨match pattern ⟩)
.
⟨match quasipattern ⟩)
|
(
⟨match quasipattern ⟩ .
⟨match quasipattern ⟩)
|
#(
⟨match quasipattern ⟩*)
⟨atomic datum ⟩ ⟶
|
⟨boolean ⟩
|
⟨number ⟩
|
⟨character ⟩
|
⟨string ⟩
|
⟨bytevector ⟩
⟨match pattern name ⟩ ⟶
⟨any identifier except quote
and quasiquote
⟩
⟨match pattern argument ⟩ ⟶
⟨match pattern ⟩
|
⟨expression ⟩
⟨match pattern identifier ⟩ ⟶
⟨any identifier except _
and ...
⟩
⟨next id ⟩ ⟶
⟨identifier ⟩
⟨back id ⟩ ⟶
⟨identifier ⟩
⟨match quasipattern symbol ⟩ ⟶
⟨any identifier except unquote
and unquote-splicing
⟩
⟨underscore ⟩ ⟶
⟨the identifier _ ⟩
Note: _
...
and =>
in this specification are auxiliary syntaxes
identical to the like-named syntaxes exported by the standard libraries.
The match
form behaves as follows. First, its ⟨expression ⟩
argument is evaluated. It should return a single value, which is subsequently tested against match
rules in the order they are written until a matching rule is found. If no matching rules are found,
match
returns an undefined value.
A rule matches if the application of its pattern to the input object is successful and there is
no “guard clause” (a list of the =>
keyword followed by one or two identifiers). If a guard
clause is present, it binds the identifiers to procedures allowing advanced control over the matching
process. The first procedure, bound within the body to ⟨next id ⟩, allows
one to skip the current rule as if its pattern match was unsuccessful. In two-identifier guard clauses,
the second procedure, bound within the body to ⟨back id ⟩, allows one to
tell the matcher that the current pattern should consider another way to match, or, if no such way exists,
to skip the pattern as if its attempts to match were unsuccessful.
Both procedures accept no arguments and can be called from any tail position within the
body. If these procedures are called in a different way, the behavior is undefined.
Patterns may contain pattern variables, which, if the application of a pattern succeeds, become bound in the corresponding body to parts of the input object. The values of these variables may be used to reject the match via the guard clause mechanism described above.
Any pattern containing pattern variables may use the same variable in more than one place within the
pattern. In such cases all the corresponding sub-objects, assigned to the repeated variable,
should “agree”, i.e. to be the same according to the equal?
predicate.
Some patterns, marked as (iterative) in the pattern entry's header line, may match the input object in multiple ways. The matcher tries these different matches in the order specified in the pattern description. The two common orders are (iterative, greedy) , maximizing the number of input elements matched by sub-patterns on the left, and (iterative, non-greedy) , maximizing the number of input elements matched by sub-patterns on the right. When iterative patterns are nested, the matcher acts as if the outer pattern has full control over the decomposition of the input object, allowing its immediate sub-patterns to work with the parts in the order the outer pattern supplies them.
⟨atomic datum ⟩ match pattern
Literal patterns, ⟨boolean ⟩, …,
⟨bytevector ⟩, match equal objects (in terms of
Scheme equal?
predicate).
_
match pattern
The ⟨underscore ⟩ pattern matches anything. It is
not a variable, so it produces no bindings visible in the body of the rule.
⟨match pattern identifier ⟩ match pattern
The ⟨match pattern identifier ⟩ pattern acts like a
pattern variable. A pattern variable will match any value and can be repeated.
If repeated, all subsequent like-named variables will only match the value that
is equal?
to the value matched by the first. This value is made
available inside ⟨body ⟩ as if a local identifier
with the same name is bound to this value via the let
form surrounding
the body. It is an error to refer to pattern identifiers in expressions within
the same pattern, or to use them as ⟨match pattern name ⟩s.
(quote
⟨datum ⟩)
match pattern
The quote
pattern matches ⟨datum ⟩. Again, the Scheme equal?
predicate is used for comparison.
(quasiquote
⟨match quasipattern ⟩)
match pattern
The quasiquote
pattern can be understood in terms of regular patterns. It is translated
into combinations of basic patterns and data patterns (see below). The translation
function T[⟨match quasipattern ⟩] ⟹ ⟨match pattern ⟩
resembles simplified translation of the regular quasiquote
form (shown in
abbreviated form, first matching rule is chosen):
T[() ] | ⟹ | (quote ()) |
T[⟨atomic datum ⟩] | ⟹ | ⟨atomic datum ⟩ |
T[(unquote ⟨mp ⟩) ] |
⟹ | ⟨mp ⟩ |
T[((unquote-splicing ⟨mp ⟩)) ] |
⟹ | ⟨mp ⟩ |
T[((unquote-splicing ⟨mp ⟩) .
⟨mqp ⟩) ] |
⟹ | (~append ⟨mp ⟩
T[⟨mqp ⟩]) |
T[( ⟨mqp 1⟩ .
⟨mqp 2⟩)] |
⟹ | (~cons T[⟨mqp 1⟩]
T[⟨mqp 2⟩]) |
T[#( ⟨mqp ⟩*)] |
⟹ | (~vector
T[⟨mqp 1⟩]*) |
(~value
⟨expression ⟩)
match pattern
The ~value
pattern matches if the input object is equal to the result of
⟨expression ⟩, calculated during the match. The Scheme equal?
predicate is used for comparison. This is a convenient way to match against non-pattern variables
bound outside the match
form.
List patterns match various types of pairs/lists, disassembling them into sub-expressions
and applying to them sub-patterns. They are named to resemble the corresponding Scheme constructors,
and can be combined in a similar manner. Their names are not built into the matcher, but
are defined via define-syntax
or its equivalent, so they need to be imported
if used explicitly.
(~cons
⟨mp a⟩ ⟨mp d⟩)
match pattern
The ~cons
pattern matches a pair whose car matches
⟨mp a⟩ and whose cdr matches
⟨mp d⟩.
(~list
⟨mp ⟩ …)
match pattern
The ~list
pattern matches a proper list, with as many elements
as there are sub-patterns ⟨mp ⟩ … The elements
should match the corresponding patterns.
(~append
⟨mp ⟩ …)
match pattern (iterative, greedy)
The ~append
pattern matches a (possibly improper) list, breaking it into
sub-list segments that can match the corresponding patterns
⟨mp ⟩ … If the list is improper, the last
pattern is applied to its improper tail segment.
Note: This is an example of a iterative
pattern that may match the target list in many different ways; all “solutions” are equally good,
as long as the segments can be appended via append
to make a list equal to the original.
The solutions are tried starting with ones maximizing the length of the leftmost
segment, then the segment after it, etc.
(~append/ng
⟨mp ⟩ …)
match pattern (iterative, non-greedy)
The ~append/ng
pattern is a “non-greedy” variant of the
~append
pattern. The solutions are tried starting with the ones maximizing
the length of the rightmost segment, then the segment before it, etc.
(~append/t
⟨datum ⟩ ⟨mp 1⟩ ⟨mp 2⟩)
match pattern
The ~append/t
pattern matches a (possibly improper) list, breaking it
into two sub-list segments to be matched against ⟨mp 1⟩ and
⟨mp 2⟩ sub-patterns.
The number of pairs in the “spine”
of the second segment is assumed to be equal to the number of “spine” pairs in the
⟨datum ⟩ argument. The pattern fails if the input list is
too short to be broken in this way.
Note: This is a non-iterative pattern, providing
a degree of efficiency where the length of the tail pattern is known at macroexpand time.
(~etc
⟨mp ⟩)
match pattern
The ~etc
pattern matches a proper list such that every one
of its elements matches ⟨mp ⟩. If this sub-pattern
contains pattern vars, they will be bound to lists of values obtained via
individual element matches in the order of their appearance in the input
list. This pattern can be compared to the postfix ellipsis (...
)
notation popularized by syntax-rules
.
Note: If one or more sub-pattern
variables are also used outside ~etc
in the same rule-level pattern,
the non-linear “agreement” rule is applied to the final list values collected
by ~etc
pattern on a successful match.
A limited set of vector patterns is provided. Advanced matching of vectors
can be performed either by combining list patterns with ~list->vector
converter pattern, or by using core patterns described further below.
(~vector
⟨mp ⟩ …)
match pattern
The ~vector
pattern matches a vector with as many elements
as there are sub-patterns ⟨mp ⟩ … The elements
should match the corresponding patterns.
(~vector-append
⟨mp ⟩ …)
match pattern (iterative, greedy)
The ~vector-append
pattern matches a vector, breaking it into
sub-vector segments that can match the corresponding patterns
⟨mp ⟩ …
Note: This is another example of an iterative
pattern that may match the target vector in many different ways; the “solutions” are attempted starting
with ones maximizing the length of the leftmost segment, then the segment after it, etc.
(~vector-append/ng
⟨mp ⟩ …)
match pattern (iterative, non-greedy)
The ~vector-append/ng
pattern is a “non-greedy” variant of the
~vector-append
pattern. The solutions are attempted starting with the ones maximizing
the length of the rightmost segment, then the segment before it, etc.
A limited set of string patterns is provided. Advanced matching of strings
can be performed either by combining list patterns with the ~list->string
converter pattern, or by using core patterns described further below.
(~string
⟨mp ⟩ …)
match pattern
The ~string
pattern matches a string with as many characters
as there are sub-patterns ⟨mp ⟩ … The characters
should match the corresponding patterns.
(~string-append
⟨mp ⟩ …)
match pattern (iterative, greedy)
The ~string-append
pattern matches a string, breaking it into
sub-string segments that can match the corresponding patterns
⟨mp ⟩ …
This is an iterative pattern; the “solutions” are attempted starting
with ones maximizing the length of the leftmost segment, then the segment after it, etc.
(~string-append/ng
⟨mp ⟩ …)
match pattern (iterative, non-greedy)
The ~string-append/ng
pattern is a “non-greedy” variant of the
~string-append
pattern. The solutions are attempted starting with the ones maximizing
the length of the rightmost segment, then the segment before it, etc.
These patterns extend the functionality of the matcher, allowing it to deal with
numbers and symbols as strings, strings and chars as numbers, etc. They are named
as the corresponding constructors, so they match the types of objects on the right-hand
side of ->
, e.g. ~list->vector
matches vectors, but
applies its sub-pattern to the result of converting it to the list. They should be read
as converters for the patterns : ~list->vector
makes
a vector pattern from a list pattern, and so on.
(~vector->list
⟨mp ⟩)
match pattern
(~string->list
⟨mp ⟩)
match pattern
(~list->vector
⟨mp ⟩)
match pattern
(~list->string
⟨mp ⟩)
match pattern
(~string->symbol
⟨mp ⟩)
match pattern
(~symbol->string
⟨mp ⟩)
match pattern
Convert a pattern built to match objects of the type on the left side of ->
to the pattern able to match objects of the type on the right side of ->
,
applying the “reverse” converter function internally. E.g. if ⟨mp ⟩
can match lists of characters, (~list->string
⟨mp ⟩)
will be able to match strings containing the same characters.
(~string->number
⟨mp ⟩)
match pattern
(~string->number
⟨mp ⟩ ⟨rx ⟩)
match pattern
(~number->string
⟨mp ⟩)
match pattern
(~number->string
⟨mp ⟩ ⟨rx ⟩)
match pattern
These patterns convert between number and string patterns, accepting an optional
⟨rx ⟩ argument. This additional argument is an expression that is evaluated
when the pattern is applied, and should produce a radix, suitable for Scheme
string->number
/ number->string
procedures.
The role of predicate patterns is to “guard” their optional sub-patterns, making sure that the input object belongs to a certain Scheme type. Their set is limited, but new ones can be easily defined, as explained in the section on derived patterns below.
(~null?
⟨mp ⟩ …)
match pattern
(~pair?
⟨mp ⟩ …)
match pattern
(~list?
⟨mp ⟩ …)
match pattern
(~boolean?
⟨mp ⟩ …)
match pattern
(~number?
⟨mp ⟩ …)
match pattern
(~integer?
⟨mp ⟩ …)
match pattern
(~vector?
⟨mp ⟩ …)
match pattern
(~string?
⟨mp ⟩ …)
match pattern
(~symbol?
⟨mp ⟩ …)
match pattern
(~char?
⟨mp ⟩ …)
match pattern
These patterns match if the input object satisfies the like-named Scheme predicate, and
if there are sub-patterns, all sub-patterns match the same input object, as if
combined with the ~and
pattern combinator (see below).
These patterns apply their sub-patterns to the same input object, combining the results of individual sub-pattern matches.
(~and
⟨mp ⟩ …)
match pattern
This pattern matches if all its sub-patterns match the input object. If
sub-patterns contain pattern values, all of them are bound on a successful match. If there
are no sub-patterns, the (~and)
pattern matches any object.
(~or
⟨mp ⟩ …)
match pattern (iterative)
This pattern matches if any of its sub-patterns matches the input object. The
sub-patterns are tried in the order they are written, and the process stops on
the first sub-pattern that matches successfully (but see the note below). If sub-patterns
contain pattern values, all of them are bound on a match, but only the ones
belonging to the “successful” sub-pattern will have their values bound to the
corresponding sub-values. The remaining variables are bound to #f
.
If there are no sub-patterns, the (~or)
pattern fails on every object.
Note: this pattern is iterative in the following
sense: if backtracking exhausts all possible match solutions for one sub-pattern, the next
one is tried, then the next one, etc.
(~not
⟨mp ⟩)
match pattern
This pattern matches if its sub-pattern fails to match the input object. If
it contains pattern variables, none of them are bound on a successful match
(i.e. when the sub-pattern fails).
These patterns provide functionality popularized by the “Wright-style” family of matchers.
(~list*
⟨mp ⟩ … ⟨mp t⟩)
match pattern
The ~list*
pattern matches a (possibly improper) list with at least as many elements
as there are sub-patterns ⟨mp ⟩ … The elements
should match the corresponding patterns; the remaining part of the list should match the
last pattern, ⟨mp t⟩.
(~list-no-order
⟨mp ⟩ …)
match pattern
This pattern matches a proper list with as many elements
as there are sub-patterns ⟨mp ⟩ … It iterates through
all possible permutations of the input list looking for one with elements that
match the corresponding sub-patterns.
(~list-no-order*
⟨mp ⟩ … ⟨mp t⟩)
match pattern
This pattern matches a (possibly improper) list with at least as many elements
as there are sub-patterns ⟨mp ⟩ … The elements,
in some permutation of the original order, should match the corresponding patterns;
the remaining part of the list should match the
last pattern, ⟨mp t⟩.
(~=
⟨fn ⟩
⟨mp ⟩)
match pattern
In this pattern, ⟨fn ⟩ is not a pattern. The pattern
binds its input value to a fresh variable ⟨iv ⟩,
and then evaluates the (
⟨fn ⟩
⟨iv ⟩)
expression; its result is matched
against the ⟨mp ⟩ sub-pattern.
Note: ⟨fn ⟩ does not have
to name a function; it can be of any form. The combined expression is evaluated
as is so the matching process may be interrupted in case of errors.
(~?
⟨fn ⟩
⟨mp ⟩ …)
match pattern
In this pattern, ⟨fn ⟩ is not a pattern. The pattern binds its
input value to a fresh variable ⟨iv ⟩,
and evaluates the (
⟨fn ⟩
⟨iv ⟩)
expression. If the expression evaluates
to #f
, the pattern fails. Otherwise, the input object is matched
against ⟨mp ⟩… sub-patterns, combined via
~and
(i.e. all of them should match).
Note: ⟨fn ⟩ does not have
to name a function; it can be of any form. The expression is evaluated
as is, so the matching process may be interrupted in case of errors.
These patterns may not be as compact, nor as convenient to use as the ones described above,
but they provide functionality needed to build derived patterns via define-match-pattern
(see below). In fact, all patterns described above except for ~etc
, basic and logical ones are implementable
in this way; some of the possible derivations are shown in the Appendix.
(~prop
⟨fn ⟩
=>
⟨mp ⟩ …)
match pattern
(~prop
⟨fn ⟩
(
⟨arg ⟩ …)
=>
⟨mp ⟩ …)
match pattern
In this pattern, ⟨fn ⟩ and all ⟨arg ⟩s
are not patterns. The pattern binds its input value to a fresh variable ⟨iv ⟩,
and then evaluates the (
⟨fn ⟩ ⟨iv ⟩
⟨arg ⟩…)
expression; its results are matched against the
⟨mp ⟩ … sub-patterns.
Note: ⟨fn ⟩ does not have
to name a function; it can be of any form. The combined expression is evaluated
as is, so the matching process may be interrupted in case of errors.
(~test
⟨fn ⟩)
match pattern
(~test
⟨fn ⟩
(
⟨arg ⟩ …))
match pattern
(~test
⟨fn ⟩
=>
⟨mp ⟩)
match pattern
(~test
⟨fn ⟩
(
⟨arg ⟩ …)
=>
⟨mp ⟩)
match pattern
In this pattern, ⟨fn ⟩ and all ⟨arg ⟩s
are not patterns. The pattern binds its input value to a fresh variable ⟨iv ⟩,
and evaluates the (
⟨fn ⟩ ⟨iv ⟩
⟨arg ⟩…)
expression. If the expression evaluates to #f
,
the pattern fails. Otherwise, if ⟨mp ⟩
sub-pattern is present, it should also match the result; if it is absent, the match just
succeeds.
Note: ⟨fn ⟩ does not have
to name a function; it can be of any form. The expression is evaluated
as is, so the matching process may be interrupted in case of errors.
(~iterate
⟨start ⟩
⟨head ⟩ ⟨tail ⟩
(
⟨var ⟩…)
⟨mp ⟩)
match pattern
This pattern combinator builds a backtracking matcher
that produces a stream of “solutions” to be matched against
⟨mp ⟩; ⟨start ⟩,
⟨head ⟩, ⟨tail ⟩,
and all ⟨var ⟩s are not patterns. Internally,
it uses an iterative loop with ⟨var ⟩…
serving is a list of state variables (their names should be different,
but otherwise unimportant due to hygiene). The rest of the non-pattern
arguments are used to form internal expressions and can be either names
of procedures or macros. They are used as follows:
⟨start ⟩ is invoked
with the original value and two continuation procedures:
first one (try ) accepts 'seed' values for state variables
if start succeeds, the second one (fail ) accepts no values
and is called if start fails.
⟨head ⟩ accepts current
values of state variables and returns a single value to be matched against
⟨mp ⟩ pattern.
⟨tail ⟩ accepts the
same two continuations as ⟨start ⟩, followed by
the current values of state variables. It should either apply its
try continuation to continue with new values of state vars,
or apply its fail one to signal that there are no more “solutions”.
Note: both try and
fail should be applied in tail positions.
This SRFI facilitates creation of entirely new pattern-matching forms via derivation. Some of such derived forms may have special needs, met by the pattern combinators below.
(~if-id-member
⟨id ⟩
(
⟨literal-id ⟩ …)
⟨mp t⟩ ⟨mp f⟩)
match pattern
This pattern selects one of the two sub-pattern arguments based on the first two
non-pattern arguments. The ⟨id ⟩ argument is a match
pattern identifier, but it is not used as a sub-pattern. The second argument is a list
of distinct match pattern identifiers; they are also not used as sub-patterns.
The ~if-id-member
checks if ⟨id ⟩ is
in the list of ⟨literal-id ⟩s, selecting the
first sub-pattern for matching if it is there, or the second sub-pattern if it's not.
The method of comparison is the same as the one used to compare input identifiers
with the list of literal identifiers in syntax-rules
.
(~replace-specials
⟨new ... ⟩
⟨new _ ⟩ ⟨mp ⟩)
match pattern
This pattern matches ⟨mp ⟩ after replacing all ellipsis
(...
) identifiers in it with ⟨new ... ⟩, and
all underscore (_
) identifiers with ⟨new _ ⟩.
This is useful if ⟨mp ⟩ incorporates patterns designed to
use regular ellipsis and/or underscore as part of its syntax. Replacing them allows
pattern rewriting rules (see define-match-pattern
below) to employ these
identifiers in their special role, controlling the rewriting.
New patterns can be defined using a pattern rewriting system, replicating the
functionality of syntax-rules
, but using patterns instead of regular
Scheme expressions or definitions. It hides the details of the internal pattern
macroexpansion protocol, creating an illusion that patterns expand into
combinations of other patterns directly.
(define-match-pattern
⟨name ⟩
(
⟨literal-id ⟩ …)
(
⟨imp ⟩ ⟨omp ⟩)
…
)
syntax
This form expands into an equivalent of
(define-syntax
⟨name ⟩
(syntax-rules (
⟨literal-id ⟩ …)
⟨rule ⟩ …))
where ⟨name ⟩ is a new match pattern name, and each
⟨rule ⟩ is a syntax-rules
transformation rule,
created from a pair of an input pattern ⟨imp ⟩
and the corresponding output pattern ⟨omp ⟩. Please see
the description of syntax-rules
for the details.
To show how it works in practice, here's a definition of a ~qq
pattern,
equivalent to the built-in quasiquote
pattern; it implements the T[ ] transcription
function. More examples are available
in the Examples section and in the Appendix below.
(define-match-pattern ~qq (unquote unquote-splicing)
[(_ ,p) p]
[(_ (,@lp)) lp]
[(_ (,@lp . dp)) (~append lp (~qq dp))]
[(_ (ap . dp)) (~cons (~qq ap) (~qq dp))]
[(_ #(p ...)) (~vector (~qq p) ...)]
[(_ a) (quote a)])
(match '(1 (2 . 3) #(4))
[(~qq (,x (,y . ,z) #(,t)))
`(,x (,y . ,z) #(,t))])
⟹ (1 (2 . 3) #(4))
Although the bulk of this proposal is dedicated to patterns, there are a few things that can be done to simplify the templating too. There is some symmetry between the pattern language and the regular Scheme expressions, which in many cases allows one to re-create the input object by constructing it back in the same way it was taken apart, e.g.:
(match '(1 (2 . 3) #(4))
[(~list x (~cons y z) (~vector t))
(list x (cons y z) (vector t))])
⟹ (1 (2 . 3) #(4))
(match '(1 (2 . 3) #(4))
[`(,x (,y . ,z) #(,t))
`(,x (,y . ,z) #(,t))])
⟹ (1 (2 . 3) #(4))
Things get more complicated when one has to deal with “ellipsis” expressions. To
re-construct them, one has to use map
with a lambda
expression
containing the element construction code. To simplify this process, two extra syntax forms are provided.
(value
⟨expression ⟩)
procedure or syntax
This form just evaluates ⟨expression ⟩ and returns its
value, complementing the ~value
pattern. It is also the only form other
than quote
recognized by name by the etc
syntax form:
(etc
⟨constructor ⟩)
syntax
This form may help in shortening the construction code needed to combine lists
produced by a successful ~etc
pattern. The ⟨constructor ⟩
argument is a limited form of expression: it is either an ⟨atomic datum ⟩,
an identifier, a quoted S-expression (a two element list starting with quote
),
a value
expression (see above), or an application or macro use in a form of a list that
starts with an identifier, followed by zero or more ⟨constructor ⟩s.
The etc
form is expanded as follows: first, it is
scanned for variables. It collects all identifiers outside of quote
and value
forms that are not used as head identifiers of an application or macro use. Then, a map
application
is built, with a lambda
form with the collected variables as formal parameters
and the ⟨constructor ⟩ as its body, followed by the same
variables as the lists for map
to process. The variables do not have to be
taken from patterns, they can come from any place in the current context; all decisions on
which variables are unrolled and which are replicated are explicit.
Uses of the etc
form can be nested:
(match '((0) (1 2) (3 4 5) (6 7 8 9))
[(~etc (~cons x (~etc y*)))
(etc (cons x (etc y*)))])
⟹ ((0) (1 2) (3 4 5) (6 7 8 9))
(match '((0) (1 2) (3 4 5) (6 7 8 9))
[(~etc (~cons x (~etc y*)))
(cons (etc x) (etc y*))])
⟹ ((0 1 3 6) () (2) (4 5) (7 8 9))
This appendix gives possible definitions for some derived pattern types in terms of the basic and core pattern types.
(define-match-pattern ~value ()
[(_ e)
(~test equal? (e))])
(define-match-pattern ~cons ()
[(_ ap dp)
(~and (~test pair?) (~prop car => ap) (~prop cdr => dp))])
(define-match-pattern ~list ()
[(~list) '()]
[(~list p . p*) (~cons p (~list . p*))])
(define-match-pattern ~list* ()
[(~list* p) p]
[(~list* p . p*) (~cons p (~list* . p*))])
(define-syntax match:cno-start
(syntax-rules ()
[(_ xv try f)
(if (pair? xv) (try '() xv) (f))]))
(define-syntax match:cno-head
(syntax-rules ()
[(_ h t)
(cons (car t) (append h (cdr t)))]))
(define-syntax match:cno-tail
(syntax-rules ()
[(_ try f h t)
(if (pair? (cdr t)) (try (cons (car t) h) (cdr t)) (f))]))
(define-match-pattern ~cons-no-order ()
[(~cons-no-order pe pr)
(~iterate match:cno-start match:cno-head match:cno-tail (h t)
(~cons pe pr))])
(define-match-pattern ~list-no-order ()
[(~list-no-order) '()]
[(~list-no-order p) (~list p)]
[(~list-no-order p . p*)
(~cons-no-order p (~list-no-order . p*))])
(define-match-pattern ~list-no-order* ()
[(~list-no-order* p) p]
[(~list-no-order* p . p*)
(~cons-no-order p (~list-no-order* . p*))])
(define-match-pattern ~vector? ()
[(_ p ...)
(~and (~test vector?) p ...)])
(define-match-pattern ~vector->list ()
[(_ p)
(~and (~test list?) (~prop list->vector => p))])
(define-match-pattern ~vector ()
[(_ p ...)
(~and (~test vector?) (~prop vector->list => (~list p ...)))])
(define-match-pattern ~number->string ()
[(_ p)
(~and (~test string?) (~prop string->number => p))]
[(_ p rx)
(~and (~test string?) (~prop string->number (rx) => p))])
(define-match-pattern ~= ()
[(~= f p) (~prop f => p)])
(define-match-pattern ~? ()
[(~? f p ...) (~and (~test f) p ...)])
Sample syntax-rules
-based implementation is available.
It runs on many R6RS and R7RS and some R5RS Schemes.
This proposal is both based on and inspired by the work of the following people: Andrew K. Wright, Robert Cartwright, Alex Shinn, R. Kent Dybvig, Dan Friedman, Erik Hilsdale, Chris Hanson, Gerald Jay Sussman, Marc Nieper-Wißkirchen, Dimitris Vyzovitis, and Panicz Maciej Godek. I am grateful to Andrew Pochinsky and Daphne Preston-Kendal for their feedback.
© 202? Sergei Egorov
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.