Title

Hygienic macros.

Author

André van Tonder

Status

This SRFI is currently in ``draft'' status. To see an explanation of each status that a SRFI can hold, see here. It will remain in draft status until 2005/08/14, or as amended. To provide input on this SRFI, please


mailto:srfi-72@srfi.schemers.org

. See instructions here to subscribe to the list. You can access previous messages via the archive of the mailing list.

Received: 2005/06/05
Draft: 2005/06/14 - 2005/04/14
Revised: 2005/07/01
Revised: 2005/07/06
Revised: 2005/08/04

Index

Abstract
Introduction
Improved hygiene
Escaping ellipses
Specification
Implementation
Acknowledgements
References

Abstract

This SRFI describes a procedural macro proposal for Scheme with the following features:

Improved hygiene:

We argue that conventional hygiene algorithms may lead to violations of referential transparency and accidental variable capture errors in procedural macros. We propose an improved algorithm that avoids these problems.
Syntax-case:

In the current proposal, the syntax-case form is expressible as a macro in terms of a simpler set of primitives and is specified as library syntax.
Procedural interface:

The primitive interface for manipulating compound syntax objects consists of procedures rather than special forms. In particular, the traditional abstractions car, cdr, cons , ... can be used on syntactic data.
Fast hygiene algorithm:

The reference implementation documents a fast imperative hygiene algorithm that is eager and linear in expression size.
Fluid identifiers:

A primitive make-fluid-identifier is provided for building expansion-time fluid binding constructs. It also provides an alternative mechanism for intentional variable capture.
Source-object correlation:

The model proposed here allows comprehensive source location tracking to be performed by the expander.

Introduction

We start with a simple example:

  (define-syntax (swap! a b)
     (quasisyntax
       (let ((temp ,a)) 
         (set! ,a ,b) 
         (set! ,b temp))))

A syntax object is here constructed using the quasisyntax form. Syntax provided as part of the input expression can be inserted in the result using unquote or unquote-splicing. Macros written in this way are hygienic and referentially transparent.

The following example shows that we can use the traditional abstractions car, cdr, length, ..., on syntax objects. It also illustrates the use of the predicate literal-identifier=? for identifying literals in the input expression:

  (define-syntax (my-cond c . cs)
    (or (and (list? c) (>= (length c) 2))
        (syntax-error))
    (cond ((literal-identifier=?
            (car c)
            (syntax else)) (quasisyntax (begin ,@(cdr c))))
          ((null? cs)      (quasisyntax (if ,(car c) (begin ,@(cdr c)))))
          (else            (quasisyntax (if ,(car c)
                                            (begin ,@(cdr c))
                                            (my-cond ,@cs))))))

In the current proposal, the syntax-case form is expressible as a macro in terms of a simpler set of primitives, and is specified as library syntax. The my-cond macro can then be equivalently expressed as:

  (define-syntax my-cond
    (lambda (form)
      (syntax-case form (else)
        ((_ (else e1 e2 ...))         (syntax (begin e1 e2 ...)))
        ((_ (e0 e1 e2 ...))           (syntax (if e0 (begin e1 e2 ...))))
        ((_ (e0 e1 e2 ...) c1 c2 ...) (syntax (if e0
                                                  (begin e1 e2 ...)
                                                  (my-cond c1 c2 ...)))))))

Improved hygiene

In the following expression using syntax-rules, automatic hygiene prevents a variable capture:

  (letrec-syntax ((help (syntax-rules () ((help) (list 1 2))))
                  (main (syntax-rules () ((main) (let ((list +)) (help))))))
    (main)) ==> (1 2)

In a procedural macro system such as syntax-case, it would seem natural to implement help as a procedure as follows:

  (let-syntax ((main (lambda (_)
                       (define (help) (syntax (list 1 2)))
                       (with-syntax ((rest (help)))
                         (syntax (let ((list +)) rest))))))
    (main)) ==> 3 (wrong)

However, we get the wrong answer. Since the conventional hygiene algorithm regards all identifiers with the same name introduced during the entire duration of a macro invocation as identical, the binding x = 2 here captured the reference to x introduced by help.

We regard this as a violation of referential transparency, in a sense to be formalized below. Indeed, we cannot determine the meaning of the code fragment introduced by help from bindings visible at its definition site. This may become a problem in large macros where helpers may be far from their use site. Even worse, since it is impossible to write a helper procedure that will reliably return an expression with a well-defined meaning independent of accidental details of the use site, it is in principle impossible to build reliable modular systems or closed shared libraries containing procedural helpers.

We propose the following minimal necessary condition for referential transparency in procedural macros, which the conventional hygiene algorithm fails. This condition is a modification, appropriate for procedural macros, of the one stated in [6, 7].

If a syntax form produces an expression that is inserted unmodified in the expanded code, any identifier references in the expression that are free, when considering the expression in isolation, will refer to the bindings that were visible where the syntax form appeared, regardless of any local bindings that may surround the use of the expression in the expanded code.

To satisfy this condition, this SRFI proposes the following modified hygiene rule [2-4]:

A binding for an identifier can only capture a reference to another if both were present in the source or introduced during a single evaluation of the same syntax form.

With this proposal, the above macro gives the correct answer:

  (let-syntax ((main (lambda (_)
                       (define (help) (syntax (list 1 2)))
                       (with-syntax ((rest (help)))
                         (syntax (let ((list +)) rest))))))
    (main)) ==> (1 2)

The modified hygiene rule also cleanly solves the problem of accidental variable capture in procedural macros that introduce bindings recursively, as in the following broken attempt to write a version of let with left to right evaluation:

  (define-syntax let-in-order
    (lambda (form)
      (syntax-case form ()
        ((_ ((i e) ...) e0 e1 ...)         
         (let f ((ies (syntax ((i e) ...)))
                 (its '())) 
           (syntax-case ies () 
             (()            (with-syntax ((its its))
                              (syntax (let its e0 e1 ...))))
             (((i e) . ies) (with-syntax ((rest (f (syntax ies)
                                                   (cons (syntax (i t)) its))))
                              (syntax (let ((t e)) rest))))))))))
  
  (let-in-order ((x 1)
                 (y 2))
    (+ x y))   
               ==> 4 (wrong)                   [with conventional algorithm]
               ==> Error: unbound identifier t [with proposal of this SRFI]

Also here, had the helper f been expressed as a macro, automatic hygiene would have ensured that the identifiers (t ...) introduced in successive recursive steps were distinct.

The conventional hygiene algorithm silently gives the wrong answer. With the modified rule proposed above, the macro instead fails noisily with an error, forcing us to correct the dangling reference to t in the original as follows:

  (define-syntax let-in-order
    (lambda (form)
      (syntax-case form ()
        ((_ ((i e) ...) e0 e1 ...)         
         (let f ((ies (syntax ((i e) ...)))
                 (its '())) 
           (syntax-case ies () 
             (()            (with-syntax ((its its))
                              (syntax (let its e0 e1 ...))))
             (((i e) . ies) (with-syntax ((t (syntax t)))
                              (with-syntax ((rest (f (syntax ies)
                                                     (cons (syntax (i t)) its))))
                                (syntax (let ((t e)) rest)))))))))))  
  
  (let-in-order ((x 1)
                 (y 2))
    (+ x y))                ==> 3

In the presence of quasisyntax, we propose the following simple extension of the above rule:

A binding for an identifier can only capture a reference to another if both were present in the source or introduced during a single evaluation of the same syntax or quasisyntax form, with the understanding that the evaluation of any nested, unquoted syntax or quasisyntax forms counts as part of the evaluation of an enclosing quasisyntax.

Using quasisyntax, we can rewrite the above let-in-order macro more concisely as follows:

  (define-syntax let-in-order
    (lambda (form)
      (syntax-case form ()
        ((_ ((i e) ...) e0 e1 ...)         
         (let f ((ies (syntax ((i e) ...)))
                 (its (syntax ()))) 
           (syntax-case ies () 
             (()            (quasisyntax (let ,its e0 e1 ...)))
             (((i e) . ies) (quasisyntax 
                             (let ((t e))
                               ,(f (syntax ies)
                                   (quasisyntax ((i t) ,@its))))))))))))
  
  (let-in-order ((x 1)
                 (y 2))
    (+ x y))                ==> 3

Since, in the last clause, the unquoted occurrence of quasisyntax is regarded as a continuation of the enclosing quasisyntax for the purpose of identifier equivalence, we no longer need to introduce t separately.

Notice that the rule implies that, inside an enclosing quasisyntax, two unquoted syntax expressions can produce identical identifiers. For example:

  (let-syntax ((m (lambda (_) 
                    (quasisyntax 
                     (let ((,(syntax x) 1)) ,(syntax x))))))
    (m))  ==> 1

Indeed, the above rule ensures the equivalence:

  (quasisyntax (let ((,(syntax x) 1)) ,(syntax x))) = (quasisyntax (let ((x 1)) x))

We are compelled to adopt this rule if we impose the conditions that the traditional idioms for macro-generating macros containing nested quasisyntax forms work correctly, and that the resulting macro to be equivalent to the corresponding syntax-case macro. For example, with the above rule the expression

  (let-syntax ((m (lambda (form)
                    (let ((x (cadr form)))
                      (quasisyntax 
                       (let-syntax ((n (lambda (_)
                                         (quasisyntax 
                                          (let ((,(syntax ,x) 4)) ,(syntax ,x))))))
                         (n)))))))
    (m z))  ==> 4

means exactly the same as

  (let-syntax ((m (lambda (form)
                    (syntax-case form ()
                      ((_ x) (syntax 
                              (let-syntax ((n (lambda (_)
                                                (syntax (let ((x 4)) x)))))
                                (n))))))))
    (m z))   ==> 4

Escaping ellipses

We require the ellipsis in the template in (syntax ...) to be interpreted as an ordinary identifier, not an ellipsis literal. The following idiom can then be used to include ellipses in syntax-case-generated macros:

  (let-syntax ((m (lambda (form)
                    (syntax-case form ()
                      ((_ x ...)
                       (with-syntax ((::: (syntax ...)))
                         (syntax
                          (let-syntax ((n (lambda (form)
                                            (syntax-case form ()
                                              ((_ x ... :::)
                                               (syntax `(x ... :::)))))))
                            (n a b c d)))))))))
      (m u v))  
                ==> (a b c d)

Specification

The following primitive forms are provided:

         define-syntax
         let-syntax
         letrec-syntax
                        
         identifier?
         bound-identifier=?
         free-identifier=?
         literal-identifier=?
         
         syntax
         quasisyntax
         embedded-syntax

         datum->syntax-object       
         syntax-object->datum
         make-fluid-identifier

         expand  
         syntax-debug        
         syntax-error

The following library forms are provided:

         syntax-case
         with-syntax
         syntax-rules

Syntax objects:

A syntax object is a graph whose nodes are Scheme pairs or vectors and whose leaves are constants or identifiers. The following expressions evaluate to syntax objects:

  '()
  1
  #f
  '(1 2 3)
  (cons (syntax x) (vector 1 2 3 (syntax y)))
  (syntax (let ((x 1)) x))
  (quasisyntax (let ((x 1)) ,(syntax x)))

Symbols may not appear in syntax objects:

  '(let ((x 1)) x)  ==> not a syntax object

syntax: (define-syntax keyword exp) (define-syntax (keyword . formals) exp1 exp ...)

Exp is expanded and evaluated in the current top-level syntactic environment, and must evaluate to a procedure of type syntax-object -> syntax-object, also called a transformer. The top-level syntactic environment is extended by binding the identifier keyword to the resulting transformer.

The second variant is equivalent to

  (define-syntax keyword 
     (let ((transformer (lambda (dummy . formals) exp1 exp ...)))
       (lambda (form)
         (apply transformer form))))

syntax: (let-syntax ((keyword exp) ...) exp* ...) (letrec-syntax ((keyword exp) ...) exp* ...)

We generalize R5RS, section (4.3.1), by allowing each expression exp to evaluate to an arbitrary transformer procedure.

We also impose the requirement that let[rec]-syntax behave as a splicing form rather than introducing a new local scope. For example:

 (let ((x 1))
    (let-syntax ((foo (syntax-rules ())))
      (define x 2))
    x)               ==> 2

procedure: (identifier? obj)

Returns #t if obj is an identifier, #f otherwise. The identifier type is disjoint from other Scheme primitive types described in R5RS, section (3.2).

procedure: (bound-identifier=? obj1 obj2) (free-identifier=? obj1 obj2) (literal-identifier=? obj1 obj2)

Identifiers are free-identifier=? if they would refer to the same lexical or toplevel binding if inserted as free identifiers in the result of the macro expansion. For this purpose, all identifiers that are not lexically bound are considered implicitly bound at the toplevel.

Identifiers are literal-identifier=? if they are free-identifier=? or if they both refer to toplevel bindings and have the same symbolic name. This primitive should be used to reliably identify literals (such as else in cond) even if they occur in a different module from the macro definition.

Identifiers are bound-identifier=? if a binding of one would capture references to the other in the scope of the binding.

Two identifiers with the same name are bound-identifier=? if both were present in the same toplevel expression in the original program text. Two identifiers will also be bound-identifier=? if they were produced from existing bound-identifier=? identifiers during a single evaluation of the same syntax or quasisyntax form, with the understanding that the evaluation of any nested, unquoted syntax or quasisyntax forms counts as part of the evaluation of an enclosing quasisyntax. In addition, datum->syntax-object may create identifiers that are bound-identifier=? to previously introduced identifiers.

These procedures return #f if either argument is not an identifier.

  (free-identifier=?  (syntax x) (syntax x))      ==> #t
  (bound-identifier=? (syntax x) (syntax x))      ==> #f

  (let ((y (syntax (x . x))))
    (bound-identifier=? (car y) 
                        (cdr y)))                 ==> #t

  (quasisyntax ,(bound-identifier=? (syntax x) 
                                    (syntax x)))  ==> #t

  (let ((x 1))
    (let-syntax ((m (lambda (form)
                      (quasisyntax
                       (let ((x 2))
                         (let-syntax ((n (lambda (form)
                                           (free-identifier=? (cadr form) 
                                                              (syntax x)))))
                           (n ,(cadr form))))))))
      (m x)))  ==> #f

syntax: (syntax datum)

Creates a new syntax object from datum, which must be a syntax object embedded in the input form, as follows: Constants contained in datum are unaffected, while identifiers are replaced by fresh identifiers that are different from all previously existing identifiers in the sense of bound-identifier=?. Two of the resulting identifiers will be bound-identifier=? only if they replaced existing bound-identifier=? identifiers in datum during a single evaluation of the syntax form.

These fresh identifiers remain free-identifier=? to the original identifiers. This means that a fresh identifier will denote the same binding as the original identifier in datum unless macro expansion places an occurrence of it in a binding position.

The core syntax form decribed here has no notion of pattern variable insertion, but is effectively rebound in the scope of syntax-case clauses to implement that feature (see below).

Examples:

  (bound-identifier=? (syntax x) (syntax x))   ==> #f

  (let ((y (syntax (x . x))))
    (bound-identifier=? (car y) 
                        (cdr y)))              ==> #t


  (syntax-object->datum (syntax (x ...)))      ==> (x ...)   

  (define (generate-temporaries list)
    (map (lambda (ignore) (syntax temp))
         list))

Note that syntax does not unify identifiers previously distinct in the sense of bound-identifier=? occurring in datum even if they have the same symbolic name:

  (let ((x 1))
    (let-syntax
        ((foo (lambda (form)
                (quasisyntax
                 (let-syntax
                     ((bar (lambda (_) 
                             (syntax (let ((x 2)) ,(cadr form))))))
                    (bar))))))
      (foo x)))                 ==> 1

syntax: (quasisyntax template)

Constructs a new syntax object from template, parts of which may be unquoted using unquote or unquote-splicing. If no unquoted subexpressions appear at the same nesting level as the outermost quasisyntax, the result of evaluating (quasisyntax template) is equivalent to the result of evaluating (syntax template). However, if unquoted expressions do appear, they are evaluated and inserted or spliced into the resulting structure according to the rules described for quasiquote in R5RS (4.2.6).

To make nested unquote-splicing behave in a useful way, the R5RS-compatible extension to quasiquote in appendix B of the paper [10] is required mutatis mutandis for quasisyntax.

Identifiers introduced when evaluating the quasisyntax form are different from all previously existing identifiers in the sense of bound-identifier=?. Two of the resulting identifiers will be bound-identifier=? only if they replaced existing bound-identifier=? identifiers in template during a single evaluation of the quasisyntax form, with the understanding that the evaluation of any nested, unquoted syntax or quasisyntax forms counts as part of the evaluation of the enclosing quasisyntax.

The core quasisyntax form decribed here has no notion of pattern variable insertion, but is effectively rebound in the scope of syntax-case clauses to implement that feature (see below).

Examples:

  (bound-identifier=? (quasisyntax x)
                      (quasisyntax x))                   ==> #f

  (quasisyntax ,(bound-identifier=? (quasisyntax x) 
                                    (syntax x)))         ==> #t

  (let-syntax ((f (lambda (form) (syntax (syntax x)))))
    (quasisyntax ,(bound-identifier=? (f) (f))))         ==> #f

  (let-syntax ((m (lambda (form)
                    (let ((x (cadr form)))
                      (quasisyntax 
                       (let-syntax ((n (lambda (_)
                                         (quasisyntax 
                                          (let ((,(syntax ,x) 4)) ,(syntax ,x))))))
                         (n)))))))
    (m z))  ==> 4

Quasisyntax is often useful as a complement to syntax-case and with-syntax, since it allows macros to follow the structure of the generated code instead of inverting the order as with-syntax would require. For example,

  (define-syntax case   
    (lambda (x)
      (syntax-case x ()
        ((_ e c1 c2 ...)
         (quasisyntax
          (let ((t e))
            ,(let f ((c1 (syntax c1)) (cmore (syntax (c2 ...))))
               (if (null? cmore)
                   (syntax-case c1 (else)
                     ((else e1 e2 ...)    (syntax (begin e1 e2 ...)))
                     (((k ...) e1 e2 ...) (syntax (if (memv t '(k ...))
                                                      (begin e1 e2 ...)))))
                   (syntax-case c1 ()
                     (((k ...) e1 e2 ...)
                      (quasisyntax
                       (if (memv t '(k ...))
                           (begin e1 e2 ...)
                           ,(f (car cmore) (cdr cmore))))))))))))))

Notice that here we do not need to first introduce t separately, since embedded occurrences of syntax or quasisyntax are regarded as continuations of the outer quasisyntax for the purpose of bound-identifier=? equivalence.

syntax: (embedded-syntax datum)

Returns the existing syntax object datum in the input form. Unlike syntax, no new syntax object is constructed. This primitive is useful for defining certain kinds of macro-generating macros that have to compose pieces of code preserving bound-identifier=? equivalence where not all the pieces are passed via the whole chain of macro calls.

For example, in the following fragment, z is passed to the inner macro by two paths, one of them via form and then form*, and the other via a combination of form only. Using embedded-syntax, we can "tunnel" z unmodified to the inner macro as follows:

  (let ((z 2))
    (let-syntax ((m (lambda (form)
                      (quasisyntax
                       (let-syntax ((n (lambda (form*)
                                         (quasisyntax 
                                          (let ((,(cadr form*) 1))
                                            ,(embedded-syntax ,(cadr form)))))))
                         (n ,(cadr form)))))))
      (m z)))   ==> 1

Compare this with:

  (let ((z 2))
    (let-syntax ((m (lambda (form)
                      (quasisyntax
                       (let-syntax ((n (lambda (form*)
                                         (quasisyntax
                                          (let ((,(cadr form*) 1))
                                            ,(syntax ,(cadr form)))))))
                         (n ,(cadr form)))))))
      (m z)))   ==> 2

procedure: (datum->syntax-object template-identifier obj)

Transforms obj, which must be a graph with pairs or vectors as nodes and with symbols or constants as leaves, to a syntax object as follows: Constants in obj are unaffected, while symbols appearing in obj are replaced by identifiers that behave under bound-identifier=?, free-identifier=? and literal-identifier=? the same as an identifier with the same symbolic name would behave if it had occurred together with template-identifier in the same source toplevel expression or was produced during the same evaluation of the syntax or quasisyntax expression producing template-identifier.

If template-identifier is a fluid identifier, the symbols in obj will also be converted to fluid identifiers.

  (let-syntax ((m (lambda (_)
                    (let ((x (syntax x)))
                      (let ((x* (datum->syntax-object x 'x)))
                        (quasisyntax
                         (let ((,x 1)) ,x*)))))))
    (m))        ==> 1


  (let ((x 1))
    (let-syntax ((m (lambda (form)
                      (quasisyntax
                       (let ((x 2))
                         (let-syntax ((n (lambda (form)
                                           (datum->syntax-object (cadr form) 'x))))
                           (n ,(cadr form))))))))
      (m z)))   ==> 1

Datum->syntax-object provides a hygienic mechanism for inserting bindings that intentionally capture existing references. Since composing such macros is a subtle affair, with various incorrect examples appearing in the literature, we present a worked-out example, courtesy of [2]:

  (define-syntax if-it
    (lambda (x)
      (syntax-case x ()
        ((k e1 e2 e3)
         (with-syntax ((it (datum->syntax-object (syntax k) 'it)))
           (syntax (let ((it e1))
                     (if it e2 e3)))))))) 
  
  (define-syntax when-it
    (lambda (x)
      (syntax-case x ()
        ((k e1 e2)
         (with-syntax ((it* (datum->syntax-object (syntax k) 'it)))
           (syntax (if-it e1
                          (let ((it* it)) e2)
                          (if #f #f))))))))
  
  (define-syntax my-or
    (lambda (x)
      (syntax-case x ()
        ((k e1 e2)
         (syntax (if-it e1 it e2))))))

  (if-it 2 it 3)    ==> 2
  (when-it 42 it)   ==> 42
  (my-or 2 3)       ==> 2
  (my-or #f it)     ==> Error: undefined identifier: it
                                       
  
  (let ((it 1)) (if-it 42 it #f))    ==> 42
  (let ((it 1)) (when-it 42 it))     ==> 42
  (let ((it 1)) (my-or 42 it))       ==> 42
  (let ((it 1)) (my-or #f it))       ==> 1
  (let ((if-it 1)) (when-it 42 it))  ==> 42

Notice how my-or purposely does not expose it to the user. On the other hand, the definition of when-it explicitly reexports it to the use site, while preserving referential transparency in the last example.

procedure: (syntax-object->datum syntax-object)

Transforms a syntax object to a new graph with identifiers replaced by their symbolic names.

procedure: (make-fluid-identifier template-identifier symbol)

This procedure returns a fresh identifier that is free-identifier=? to (datum->syntax-object template-identifier symbol). The new identifier is not bound-identifier=? to any existing identifiers. If the identifier is inserted as a bound identifier in a binding form, the binding will capture any identifiers in the scope of the binding that are free-identifier=? to (datum->syntax-object template-identifier symbol).

This primitive is useful for implementing expand-time fluid binding forms. The following example illustrates how one may use it to implement fluid-let-syntax from [6, 7]:

  (define-syntax fluid-let-syntax
    (lambda (form)
      (syntax-case form ()
        ((k ((i e) ...) e1 e2 ...) 
         (with-syntax (((fi ...) 
                        (map (lambda (i)
                               (make-fluid-identifier i
                                                      (syntax-object->datum i)))
                             (syntax (i ...)))))
           (syntax 
            (let-syntax ((fi e) ...) e1 e2 ...)))))))
          
  
  (let ((f (lambda (x) (+ x 1))))
    (let-syntax ((g (syntax-rules ()
                      ((_ x) (f x)))))
      (let-syntax ((f (syntax-rules ()
                        ((_ x) x))))
        (g 1))))   ==> 2

  
  (let ((f (lambda (x) (+ x 1))))
    (let-syntax ((g (syntax-rules ()
                      ((_ x) (f x)))))
      (fluid-let-syntax ((f (syntax-rules ()
                              ((_ x) x))))
        (g 1))))   ==> 1

This primitive also provides an alternative mechanism for inserting bindings that intentionally capture existing references. Consider:

  (define-syntax if-it
    (lambda (x)
      (syntax-case x ()
        ((k e1 e2 e3)
         (with-syntax ((it (make-fluid-identifier (syntax here) 'it)))
           (syntax (let ((it e1))
                     (if it e2 e3)))))))) 
  
  (define-syntax when-it
    (lambda (x)
      (syntax-case x ()
        ((k e1 e2)
         (syntax (if-it e1 e2 (if #f #f)))))))
  
  (define-syntax my-or
    (lambda (x)
      (syntax-case x ()
        ((k e1 e2)
         (syntax (let ((thunk (lambda () e2)))
                   (if-it e1 it (thunk))))))))

  (if-it 2 it 3)     ==> 2
  (when-it 42 it)    ==> 42
  (my-or 2 3)        ==> 2
  (my-or #f it)      ==> undefined identifier: it
                                       
  
  (let ((it 1)) (if-it 42 it #f))     ==> 1
  (let ((it 1)) (when-it 42 it))      ==> 1
  (let ((it 1)) (my-or 42 it))        ==> 42
  (let ((it 1)) (my-or #f it))        ==> 1
  (let ((if-it 1)) (when-it 42 it))   ==> 42

Although when-it here is simpler than the above solution using datum->syntax-object, since captures do not have to be explicitly propagated, it now takes a little more work to prevent propagation of captures, as the my-or macro shows. Also, in two cases the answer is different, with explicit bindings here taking precedence over implicit bindings. This behaviour is the same as that of the MzScheme solution suggested in [13], but can be changed by modifying the first argument to make-fluid-identifier.

procedure: (expand syntax-object)

Expands the syntax object fully to obtain a core Scheme expression.

  (expand (syntax (let ((x 1)) x)))  ==> ((lambda (@x5872) @x5872) 1)

procedure: (syntax-debug syntax-object)

Converts its argument to a human-readable format.

  (syntax-debug (syntax (let ((x 1)) y)))   ==> (let ((x#top 1)) y#top)

procedure: (syntax-error obj ...)

Invokes a syntax error. The objects obj ... are displayed, available source-object correlation information is displayed or provided to debugging tools, and the expander is stopped.

library syntax: (syntax-case exp (literal ...) clause ...) clause := (pattern output-expression) (pattern fender output-expression)

The syntax-case form can be written as a macro in terms of the primitives specified above.

Identifiers in a pattern that are not bound-identifier=? to any of the identifiers (literal ...) are called pattern variables. These belong to the same namespace as ordinary variables and can shadow, or be shadowed by, bindings of the latter. Each pattern is identical to a syntax-rules pattern (R5RS 4.3.2), and is matched against the input expression exp, which must evaluate to a syntax object, according to the rules of R5RS, section (4.3.2), except that the first position in a pattern is not ignored. When an identifier in a pattern is bound-identifier=? to an identifier in (literal ...), it will be matched against identifiers in the input using literal-identifier=?. If a pattern is matched but a fender expression is present and evaluates to #f, evaluation proceeds to the next clause.

In the fender and output-expression of each clause, the (syntax template) and (quasisyntax template) forms are effectively rebound so that pattern variables in pattern, and visible pattern variables in nesting syntax-case forms, will be replaced in template by the subforms they matched in the input. For this purpose, the template in (syntax template) is treated identically to a syntax-rules template (R5RS 4.3.2). Subtemplates of quasisyntax templates that do not contain unquoted expressions are treated identically to syntax templates.

The rules for bound-identifier=? equivalence of fresh identifiers replacing identifiers in templates that do not refer to pattern variables remain as specified in the sections describing the primitives syntax and quasisyntax above.

An ellipsis in a template that is not preceded by an identifier is not interpreted as an ellipsis literal. This allows the following idiom for generating macros containing ellipses:

  (let-syntax ((m (lambda (form)
                    (syntax-case form ()
                      ((_ x ...)
                       (with-syntax ((::: (syntax ...)))
                         (syntax
                          (let-syntax ((n (lambda (form)
                                            (syntax-case form ()
                                              ((_ x ... :::)
                                               (syntax `(x ... :::)))))))
                            (n a b c d)))))))))
      (m u v))  
                ==> (a b c d)

library syntax: (with-syntax template)

As in [6, 7], with-syntax expands to an instance of syntax-case:

  (define-syntax with-syntax
    (lambda (x)
      (syntax-case x ()
        ((_ ((p e0) ...) e1 e2 ...)
         (syntax (syntax-case (list e0 ...) ()
                   ((p ...) (begin e1 e2 ...))))))))

library syntax: (syntax-rules template)

See R5RS, section (4.3.2). Definable in terms of syntax-case as [6. 7]:

(define-syntax syntax-rules
  (lambda (x)
    (syntax-case x ()
      ((_ (i ...) ((keyword . pattern) template) ...)
       (syntax (lambda (x)
                 (syntax-case x (i ...)
                   ((dummy . pattern) (syntax template))
                   ...)))))))

Implementation

The implementation uses the forms and procedures specified in R5RS. It does not require R5RS macros or any other existing macro system. In addition, it uses gensym with an optional string prefix argument, and an interaction-environment, no-argument variant of eval. Portability hooks are provided for Schemes that lack either of these.

It has been successfully run on at least Chez, Chicken, Gambit and MzScheme.

The implementation was strongly influenced by the explicit renaming system [8, 11].

We use a fast imperative hygiene algorithm that is eager and linear in expression size.

Source-object correlation

The specification requires compound syntax objects to be represented as ordinary Scheme lists or vectors. This means that we cannot store source location information for these in the syntax object itself.

Given this representation, a method to track source information was worked out by Dybvig and Hieb [2]: The expander simply maintains a record of the source information for each list and each (occurrence of each) identifier in some external data structure, e.g., a hash table. This would require an extra wrapper for each identifier occurrence to give it its own identity.

Since this requires reader support, which is not portable, it is not implemented in the reference implementation. Still, the reference implementation does record intermediate expansion steps all the way back to the original source expression and displays these whenever a syntax error is invoked.

During the draft period, the reference implementation will be available here.

Acknowledgements

Special thanks to Kent Dybvig and Felix Winkelmann for numerous helpful discussions.

References

[1] André van Tonder - Portable macros and modules

    http://www.het.brown.edu/people/andre/macros/index.htm

[2] R. Kent Dybvig - Private communication.

[3] Marcin 'Qrczak' Kowalczyk - Message on comp.lang.scheme:

    http://groups-beta.google.com/group/comp.lang.scheme/msg/b7075e4ca751dbdb

[4] Ben Rudiak-Gould - Message on comp.lang.scheme:

    http://groups-beta.google.com/group/comp.lang.scheme/msg/180c7627853c288e

[5] Matthew Flatt - Composable and Compilable Macros You Want it When?

[6] R. Kent Dybvig - Schez Scheme user's guide:

    http://www.scheme.com/csug/

[7] Robert Hieb, R. Kent Dybvig and Carl Bruggeman
    - Syntactic Abstraction in Scheme.

    R. Kent Dybvig - Writing hygienic macros in syntax-case

    http://library.readscheme.org/page3.html

[8] William D. Clinger - Hygienic macros through explicit renaming.

    http://library.readscheme.org/page3.html

[9] Eugene E. Kohlbecker, Daniel P. Friedman, Matthias Felleisen and Bruce F. Duba
    - Hygienic macro expansion

    http://library.readscheme.org/page3.html

[10] Alan Bawden - Quasiquotation in Lisp 

     http://citeseer.ist.psu.edu/bawden99quasiquotation.html

[11] Richard Kelsey and Jonathan Rees - The Scheme 48 implementation

     http://s48.org/

[12] Robert Hieb, R. Kent Dybvig - A compatible low-level macro facility

     Revised(4) Report on the Algorithmic Language Scheme (appendix)
     
[13] Matthew Flatt 
     - Introducing an Identifier into the Lexical Context of a Macro Call
    
     http://list.cs.brown.edu/pipermail/plt-scheme/2004-October/006891.html

Copyright

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Author: André van Tonder

Editor: Francisco Solsona