242: The CFG Language

by Marc Nieper-Wißkirchen

Status

This SRFI is currently in final status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-242@nospamsrfi.schemers.org. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.

Abstract

This SRFI defines a language to describe control-flow graphs (CFGs) suitable for formulating iterative and recursive algorithms. Using the notion of a CFG term, this language can be seamlessly embedded in the Scheme language. Complex CFG terms can be composed from simple CFG terms.

Table of contents

Rationale

In the Scheme language, procedures (and thus jump labels, according to lambda-the-ultimate philosophy) are first-class entities. In a certain sense, this makes the control-flow graph of a Scheme program a dynamic and not static entity.

The CFG language described in this SRFI, on the other hand, describes a static control-flow graph. For the applications where this is sufficient, this has the advantage that the control-flow graph can be readily statically reasoned about, allowing one, in particular, to define variable scope in terms of it.

The CFG language itself is useful for describing iterative and recursive algorithms more clearly than it is possible in the Scheme language through tail calls and (multiple) return values. Its raison d'être, however, is that more specialized languages like that of a loop facility can be easily built on top of it.

The origin of the CFG language described in this SRFI is Olin Shiver's paper The Anatomy of a Loop: A story of scope and control. In his paper, he makes two important points: The main iterative control construct in Scheme is a tail call. While a tail call as a goto that passes arguments is a pretty powerful construct, it is also as low-level as a goto. His first point is that this implies that it is not the right tool to write down iterative algorithms in a high-level fashion. This fact has stimulated the search for loop facilities allowing one to express iterative algorithms more abstractly and more composably. His second point is that just as the scoping of variables is well-defined in the Scheme language, a well-defined model of (loop) variable scoping is needed for loop facilities as well. For this, he formulates a new scoping rule, namely that binders dominate references.

While not all surface syntax has been adopted from Olin Shiver's paper, the core of the language described here subsumes his conception. The addition of a facility to handle not only iterative but also recursive algorithms is a new invention in this SRFI. This addresses a third perceived shortcoming of the Scheme language. Recursive algorithms in Scheme are based on the fact that Scheme procedures return values, possibly multiple ones. However, as soon as more than one value needs to be returned and each recursion step only needs to modify one of them, a position-only identification of return values becomes unclear and leads to repetition of code. Instead, the CFG language in this SRFI gives names to intermediate results and allows parallel processing of them.

Thanks to Scheme's expressive macro system, the CFG language can be seamlessly implemented in the standard Scheme language, fully respecting Scheme semantics.

The author plans to submit a SRFI describing an extensible loop facility built on the CFG language defined in this SRFI in the future.

Examples

The following expression evaluates to an (iterative) procedure that takes a list of integers and returns two values, the number of even and the number of odd values in the list.

(lambda (n*)
  (cfg
      (labels [(f (execute
                    (lambda (next done)
                      (if (null? n*)
                          (done)
                          (next (car n*) (cdr n*))))
                    [(n n*)
                     (execute
                       (lambda (even odd)
                         (if (odd? n)
                             (odd (+ o 1))
                             (even (+ e 1))))
                       [(e) (call f)]
                       [(o) (call f)])]
                    [()
                     (finally (e o) (values e o) (halt))]))]
        (execute (lambda (start) (start n* 0 0)) [(n* e o) (call f)]))
    (values e o)))

The following expression evaluates to a (recursive) procedure that takes a list of integers and returns two values, the sublist of even values and the sublist of odd values.

(lambda (n*)
  (cfg
      (labels [(f (execute
                    (lambda (next done)
                      (if (null? n*)
                          (done)
                          (next (car n*) (cdr n*))))
                    [(n n*)
                     (execute
                       (lambda (even odd)
                         (if (odd? n)
                             (odd)
                             (even)))
                       [()
                        (finally (e*) (cons n e*) (call f))]
                       [()
                        (finally (o*) (cons n o*) (call f))])]
                    [()
                     (finally (e* o*) (values '() '()) (halt))]))]
        (execute (lambda (start) (start n*)) [(n*) (call f)]))
    (values e* o*)))

Using the cfg form

A cfg expression lets control flow along the edges of a control-flow graph (CFG) defined by the cfg expression, potentially binding so-called loop variables, until a halt CFG block is reached. Then, control flows backwards along the edges previously taken, potentially binding so-called return variables. Finally a Scheme expression is evaluated in an environment extended by these bindings, and its values are returned.

The general form of a cfg expression is

(cfg cfg term result expression)

where the cfg term describes the CFG and result expression is an arbitrary Scheme expression.

The simplest CFG is just one consisting of a halt CFG block, whose syntax simply is:

(halt)

For example, we have

(cfg (halt) (+ 1 2))3

as the control flow along this CFG, which has no edges, immediately stops, leading on to the evaluation of the return expression.

A finally CFG block is used to bind return variables. The general syntax of a finally block is:

(finally formals expression cfg term)

When control enters the CFG fragment it describes, control is passed to the CFG fragment described by the cfg term first. When control finally flows backwards, the (Scheme) expression is evaluated to yield values that are bound as return variables to the formals. For example, we have:

(cfg
    (finally (x . y) (values 1 2 3)
      (halt))
  (list x y)(1 (2 3))

A more complicated example is the following:

(let ([x 1])
  (cfg
      (finally (y) (+ x 2)
        (finally (x) (+ x 1)
          (halt)))
    (list x y)))(2 4)

Here, control flows first from top to bottom and then backwards. The finally CFG blocks bind the return variables during the backward flow. The expression (+ x 1) sees no binding of a return variable, only the outer binding of x. On the other hand, the expression (+ x 2) sees the binding of x as a return variable by the inner finally CFG block.

While finally CFG blocks are used to perform actions during the backward control flow, execute CFG blocks perform actions during the initial forward flow through the CFG. The syntax of an execute CFG block is given by

(execute procedure expression [formals cfg term] …)

where procedure expression is a Scheme expression that must evaluate to a procedure accepting as many arguments as there are cfg terms in the execute CFG term. When the forward control flow through the CFG reaches an execute CFG block, this procedure is called and should tail-call one of its arguments. Control flow then proceeds with the CFG block described by the corresponding cfg term:

(let ([x 1] [y 2])
  (cfg
      (finally (y) (+ x 3)
        (execute
            (lambda (e)
              (set! x y)
              (e))
          [() (halt)]))
    (list x y)))(2 5)

In this example, the forward control flow passes through the finally CFG block before it reaches the execute block. The procedure expression is evaluated in the environment where x is bound to 1 and y is bound to 2. The tail call to e lets the control flow continue with the halt CFG block, from where it flows backward. The backward control flow passes through the execute CFG block before the (+ x 3) is evaluated and the return variable y is bound to the result 6.

Execute CFG terms are used to create CFGs with branches. For this, more than one cfg term has to be provided. Consider the following example:

(let ([x 1] [y 2])
  (cfg
      (execute
          (lambda (e1 e2)
            (if (odd? y) (e1) (e2)))
        [() (halt)]
        [() (finally (x) 3 (halt))])
    (+ x 10)11

Here, the control flow exiting the execute CFG block can proceed along two possible edges, e1 and e2, depending on the parity of y in this example. It stands out that the value of the example expression is not 13, as one might have expected, but 11. In other words, the binding of x that is visible in (+ x 10) is the let binding of x and not the binding as a return variable introduced in the finally CFG block. To understand this, we have to introduce the rule by with which the scope of return variables is determined. A CFG block (and, likewise, the result expression) are in the scope of a return variable if this CFG block post-dominates definitions of this return variable, that is if all possible control flow paths starting at the CFG block and ending at a halt CFG block pass through a definition of the return variable, which happens along the entry edge of a finally CFG block. In the example above, this is not the case. There is a (statically possible) control-flow path from the entry block to the first halt CFG block that does not pass through a finally CFG block defining the return variable x. Compare with the following two examples:

(let ([x 1] [y 2])
  (cfg
      (execute
          (lambda (e1 e2)
            (if (odd? y) (e1) (e2)))
        [() (finally (x) #f (halt))]
        [() (finally (x) 3 (halt))])
    (+ x 10)13
(let ([x 1] [y 2])
  (cfg
      (finally (x) 3
        (execute
            (lambda (e1 e2)
              (if (odd? y) (e1) (e2)))
          [() (halt))]
          [() (halt))]))
    (+ x 10)13

So far, we haven't talked about the formals that appear in a execute CFG term. Using these, we can bind a different kind of variables, namely so-called loop variables. One can think of loop variables as being defined during forward control flow and return variables as being defined during backward flow. The value to which a loop variable is bound when control flow leaves along an edge of an execute CFG Term is passed as an argument to the tail-called procedure that corresponds to the edge:

(let ([x 1])
  (cfg
      (execute
          (lambda (e)
            (e (+ x 2)))
        [(x) (finally (y) (+ x 3) (halt))])
    y)6

In (+ x 2), the variable x is let-bound to 1, and in (+ x 3) the binding as a loop variable to (+ 1 2) is visible.

While the scoping rule of return variables is based on the post-dominance relation, the scoping rule for loop variables is based on the dominance relation: A CFG block is in the scope of a loop variable if definitions of this loop variable dominate the CFG block, that is if all possible control flow paths starting at the entry of the whole CFG and ending at the CFG block pass through a definition of the loop variable, which happens along an edge exiting an execute CFG block.

In the following example, the definitions of the loop variable x do not dominate the expression (+ x 3):

(let ([x 1] [y 2])
  (cfg
      (execute
          (lambda (e1 e2)
            (if (odd? y) (e1 5) (e2)))
        [(x) (finally (y) (+ x 2) (halt))]
        [() (finally (y) (+ x 3) (halt))])
    y))4

The expressions in the finally CFG terms are evaluated with loop and return variables in scope. If a loop and a return variable have the same name and both are in scope, the return variable shadows the loop variable. In the following example, x is bound to 1 in (+ x 1), bound to 2 in (+ x 3), bound to 5 in (+ x 2) and bound to 7 in (+ x 4):

(let ([x 1])
  (cfg
      (execute
          (lambda (e)
            (e (+ x 1)))
        [(x) (finally (x) (+ x 2)
               (finally (x) (+ x 3)
                 (halt)))])
    (+ x 4)11

Another CFG term that can be used to define loop variables is bind. Its syntax is:

(bind ([formals expression] ...) cfg term)

When the forward control flow reaches a bind CFG term, the expressions are evaluated in the same environment in which the procedure expression of an execute CFG term would be evaluated. Then, all formals are bound in parallel, and the loop variable bindings become visible in the cfg term, with which the forward control flow proceeds. The bind CFG term is similar to the let-values Scheme expression, which binds multiple Scheme variables in parallel.

(let ([x 1] [y 2])
  (cfg
      (bind ([(x) y] [(y) x])
        (finally (x y) (values x y) (halt)))
    (list x y)))(2 1)

In order to construct more interesting examples, we need to be able to introduce loops in control-flow graphs. This is possible with the use of labels CFG terms, which have the following syntax:

(labels ([label cfg term] …) body cfg term)

where each label is an identifier. A labels CFG term binds the labels to the cfg terms, and this binding is visible within the cfg terms and the body cfg term, much like the semantics of the letrec expression for Scheme code. When (forward) control flow enters a labels CFG fragment, the control flow is passed to the CFG block described by the body cfg term.

Control can flow to the CFG block described by a cfg term using call CFG terms with the corresponding label. The syntax of a call CFG term is:

(call label)

Labels occupy a different namespace than variables (of any kind) or keywords. Labels are lexically scoped.

(let ([x 1])
  (cfg
      (labels ([x (finally (x) x (halt))])
        (call x))
    x))1

In the following example (from Olin Shiver's paper), the finally CFG block is dominated by definitions of the loop variable y. When the control flows through the label la, the loop variable y is bound before the call to la; otherwise, when the control flows through the label lb, the loop variable y is bound before the call to lj:

(let ([x 1]
      [f (lambda (y) (set! x y))]
      [g (lambda (y) (set! x (- y 1)))])
  (cfg
      (labels ([lj (finally y (+ y 10) (halt))]
               [la (execute (lambda (e1)
                              (f y) (e1))
                     [() (call lj)])]
               [lb (execute (lambda (e1)
                              (g x) (e1 (* x x)))
                     [(y) (call lj)])])
        (execute (lambda (e1 e2)
                   (if (odd? x)
                       (e1 (+ x 1))
                       (e2)))
          [(y) (call la)]
          [() (call lb)]))
    (list x y)))(2 12)

Label* is another binding construct for CFG labels; it is related to labels as let* is related to letrec. In particular, the binding of a label by label* is not visible within the CFG term bound by the label. The syntax of label* does not differ from the labels syntax:

(label* ([label cfg term] …) body cfg term)

The semantics of label* can be explained through the following rule: Replace all occurrences of (call label) in body cfg term by cfg term when the binding of the corresponding label is not (lexically) shadowed by another label binding.

(cfg
    (label* ([l (finally (x) 42 (halt))]
             [l (call l)])
      (call l))
  x)42

The final CFG term that needs to be discussed is permute. Its general syntax is:

(permute ([label cfg term] …) body cfg term)

When the forward control flow enters a CFG block described by a permute CFG term, the CFG blocks described by the cfg terms together with their labels are dynamically permuted, and control then passes to the the first CFG block. Within the lexical scope of this CFG term, the corresponding label is bound so that when it is called, control proceeds to the then second cfg term and so on until the then final cfg term is reached whose accompanying label is bound so that when called, control finally proceeds to the body cfg term term.

(cfg
    (permute ([p (bind ([(x) 10])
                   (call p))]
              [p (bind ([(y) 20])
                   (call p))])
      (finally (z) (list x y) (halt)))
  z)(10 20)

So far, the permute CFG term just looks like an inside-out label* form up to the permutation that is involved. The permute CFG term's purpose comes from some non-determinism it introduces. In the above example, the sequencing of the two bind CFG terms is not determined, i.e. whether the forward control flow first passes through the lexically first bind CFG term and then through the lexically second, or vice versa. In the above example, it does not matter. In general, the non-determinism shows up through the binding of loop and return variables. The definition of x by the first bind CFG term does not dominate the second bind CFG term (more precisely, the CFG block it describes), because a possible control flow (coming from a different sequencing) coming from the entry passes the second CFG block before the first. This is demonstrated in the following two examples:

(let ([x 1])
  (cfg
      (permute ([p (bind ([(x) 2])
                     (call p))]
                [p (bind ([(y) x])
                     (call p))])
        (finally (z) (list x y) (halt)))
    z))(2 1)
(let ([x 1])
  (cfg
      (permute ([p (finally ([y] x)
                     (call p))]
                [p (finally ([x] 2)
                     (call p))])
        (halt))
    (list x y)))(2 1)

The permutation property of the permute CFG terms also propagates to the bodies of label* and labels forms and through calls to labels bound by label*:

(let ([x 1] [y 10])
  (cfg
      (permute ([p (bind ([(x) 2])
                     (call p))])
        (labels ()
          (label* ([p (permute ([p (bind ([(z) (list x y)])
                                     (call p))])
                        (finally (x y z) (values x y z) (halt)))])
            (permute ([p (bind ([(y) x])
                           (call p))])
              (call p)))))
    (list x y z)))(2 1 (1 10))

The final concept we need to explain are macros for the cfg form. Like define-syntax is used to define Scheme macros, we use the define-cfg-syntax definition here, which is an ordinary Scheme definition and of the following form:

(define-cfg-syntax cfg keyword transformer expression)

(There is also a starred version define-cfg-syntax* with the same syntax, which does not change a previous binding of the identifier cfg keyword, but only adds CFG keyword semantics.)

When the expander processes a cfg form, it expands CFG macro uses, which look like Scheme macro uses except that a cfg keyword is used instead of a (Scheme) keyword. CFG macros are expanded by the usual Scheme macro expansion algorithm. An example is worth a thousand words:

(define-cfg-syntax loop
  (lambda (stx)
    (syntax-case stx ()
      [(_ n-expr lp-lbl loop-cfg-term body-cfg-term)
       (identifier? #'lp-lbl)
       #'(bind ([(n) n-expr])
           (labels ([lp-lbl
                     (execute
                         (lambda (loop done)
                           (if (zero? n)
                               (done)
                               (loop (- n 1))))
                       [(n) loop-cfg-term]
                       [() body-cfg-term])])
             (call lp-lbl)))])))

  (cfg
      (bind ([(n) 0])
        (loop 10 next
            (bind ([(n) (+ n 2)])
              (call next))
          (finally (n) n (halt))))
    n)20

The example above also demonstrates the usual hygiene provisions of (Scheme) macros; the variable n introduced in the macro is effectively renamed so that it doesn't shadow the variable n in the macro use. This hygiene is also the reason why the label next has to be given as an argument to the macro. One can actually get rid of this as the following example shows:

(define-cfg-label next)
(define-cfg-syntax loop
  (lambda (stx)
    (syntax-case stx ()
      [(_ n-expr loop-cfg-term body-cfg-term)
       #'(bind ([(n) n-expr])
           (labels ([next
                     (execute
                         (lambda (loop done)
                           (if (zero? n)
                               (done)
                               (loop (- n 1))))
                       [(n) loop-cfg-term]
                       [() body-cfg-term])])
             (call next)))])))

  (cfg
      (bind ([(n) 0])
        (loop 10
            (bind ([(n) (+ n 2)])
              (call next))
          (finally (n) n (halt))))
    n)20

Here, we used the define-cfg-label definition, whose syntax is simply:

(define-cfg-label label)

(The define-cfg-label syntax also comes in a starred version define-cfg-label*.) This definition binds label to a fresh label and label is replaced by this label in labels, label*, and call CFG terms where label is in scope.

Specification

The identifiers defined in this section are exported by the (srfi :242 cfg) and the (srfi :242) libraries in case of an R6RS system and by the (srfi 242) library in case of an R7RS system.

Remark: This section is a formal account of the syntax and semantics of the cfg form. It should not be misunderstood as a gentle introduction, which was given in the previous section.

CFGs

The following definitions are standard, but are included here for the sake of completeness.

A control-flow graph (CFG) is a (finite) set of CFG blocks together with a predecessor relation such that there is exactly one CFG block with no predecessor. This CFG block is the entry block of the cfg. A CFG block that is not the predecessor of any other block is an exit block. A CFG block is a successor of another one if the latter is a predecessor of the former. (In what follows, we usually describe CFGs by defining the successors of each block, which amounts to the same thing: the predecessor relation is basically the inverse one.)

The dominance relation of a CFG is the largest relation on the set of its CFG blocks such that the following holds: The set of strict dominators of the entry block is empty. The set of strict dominators of any other block is a subset of the set of dominators of each predecessor of the block. Here, a strict dominator of a block is a dominator that is not the block itself.

Note: From the definition it follows that each block dominates (but not strictly dominates) itself.

Remark: Because of the condition on the entry block, the dominance relation is not trivial and not the maximal relation on the set of CFG blocks.

The post-dominance relation of a CFG is the largest relation on the set of its CFG blocks such that the following holds: The set of strict post-dominators of an exit block is empty. The set of strict post-dominators of any other block is a subset of the set of post-dominators of each successor of the block. Here, a strict post-dominator of a block is a post-dominator that is not the block itself.

Preliminary notions

In order to describe the computation model of the CFG language, which is not the same computation model underlying Scheme (but can be expressed in the latter as the portable implementation shows), a couple of primitive notions have to be introduced, that is the grammatical context in which they are used. Their semantic meaning only follows from the context in which they are used below.

Loop variables are defined in CFG blocks. A loop variable's scope is the set of CFG blocks that are dominated by a CFG block in which the loop variable is defined.

Return variables are also defined in CFG blocks. A return variable's scope is the set of CFG blocks that strictly post-dominate a CFG block in which the return variable is introduced.

The formal description of the various CFG terms uses the notion of CFG locations, which is similarly to a location in the Scheme semantics, except that it can hold a CFG block instead of a Scheme value.

Moreover, a pending set in the sense of the formal semantics below is a set of pairs, each consisting of a CFG location and a CFG block.

The following action can be done with pending sets: when a pending set is materialized with a CFG block as its tail, the pairs of CFG locations and CFG blocks of the pending set are ordered non-deterministically. In the CFG location of each pair, the CFG block of the following pair is then stored. In the CFG location of the last pair, the tail CFG block is stored. For each CFG block in the list of pairs, its set of successors is the one-element set containing the CFG block of the following pair. For the CFG block in the last pair, its set of successors is the one-element set consisting of the tail CFG block. The CFG block in the first pair is then returned as the result of the materialization.

A CFG state is a (finite) set of bindings from identifiers to values.

A CFG term is a syntactic construct in the CFG language that can be evaluated in an environment and within a pending set. The result of the evaluation is a CFG. A block in this CFG can be called with a CFG state. The result of this call is again a CFG state. Evaluation of CFG terms happens during expand-time, the calling of CFG terms during runtime.

Entry format

Each entry is of one of two categories, “syntax” or “cfg syntax”. The first category is as in R6RS and R7RS, while an entry of the second category describes a syntactic construct in the CFG language.

CFG Expressions

(cfg cfg term result expression)

Syntax: Cfg term is an arbitrary CFG term. Result expression is an arbitrary Scheme expression.

Semantics: A cfg expression is evaluated by first evaluating the cfg term within an empty pending list, resulting in a CFG. Its entry block is then called with an empty CFG state, yielding a resulting CFG state. The environment of the cfg expression is then extended by binding the return variables in whose scope the result expression is to locations holding the values to which they are bound in the resulting CFG state. Finally, the result expression is evaluated in this extended environment and its values returned as the results of the cfg expression.

The simplest cfg expression just halts and returns some value(s). (The halt and the other cfg terms are described in the next subsection.) For example:

(cfg (halt) 'done)done

Using a finally cfg term, return variables can be bound that can be used in the result expression:

(let ([x 1])
  (cfg (finally (x y) (values (+ x 1) (+ x 2))
         (halt))
    (list x y)))(2 3)

Loop variables are bound through execute terms:

(let ([x 1])
  (cfg (execute (lambda (e)
                  (e (+ x 1)))
         [(x) (finally (res) x (halt))])
    res))2

Execute CFG terms can have more than one successor. During evaluation, one control path is chosen:

(let ([x 1])
  (cfg (execute (lambda (e1 e2)
                  (if (even? x) (e1) (e2 'odd)))
         [() (finally (res) 'even (halt))]
         [(a) (finally (res) a (halt))])
    res))odd

Due to loop variable scoping, the second finally cannot simply be hoisted because it would no longer be dominated by the introduction of the loop variable a:

(let ([a 'outer]
      [x 1])
  (cfg (finally (res) a
         (execute (lambda (e1 e2)
                    (if (even? x) (e1) (e2 'odd)))
           [() (finally (res) 'even (halt))]
           [(a) (halt)]))
      res))outer

And due to return-variable scoping, the first finally cannot simply be left out because the entry block would no longer post-dominate an introduction of the res return variable:

(let ([res 'outer]
      [x 1])
   (cfg (execute (lambda (e1 e2)
                   (if (even? x) (e1) (e2 'odd)))
          [() (halt)]
          [(a) (finally (res) a (halt))])
      res))outer

Loop variables are called loop variables because the main reason for the CFG language is that it allows writing loops. This can be done with labels CFG terms. Here, we also use the bind CFG term to bind loop variables easily:

(cfg (labels ([f (execute
                     (lambda (e1 e2)
                       (if (> x 6)
                           (e1)
                           (e2 (+ x 1) (* a x))))
                   [() (finally (res) a (halt))]
                   [(x a) (call f)])])
       (bind ([(x) 1] [(a) 1]) (call f)))
  res)720

Finally, permute CFG terms can be used to create sequences of CFG blocks so that loop variables and return variables introduced in these blocks do not have the other blocks of this sequence in scope (comparable to scoping rules of the let expression of Scheme):

(let ([x 'outer] [y 'outer])
  (cfg (label* ([c (permute ([p (finally (y) 'inner
                                  (bind ([(a) x]) (call p)))])
                     (finally (a) a (halt)))])
         (permute [(p (finally (b) y
                        (bind ([(x) 'inner]) (call p))))]
           (call c)))
    (list a b)))(outer outer)

(Label* CFG terms do not allow loops as labels CFG terms, but permute sequences carry forward across calls to labels introduced by label*.)

Primitive CFG terms

The following entries describe the primitive CFG terms. A label is an identifier. Labels do not occupy the same namespace as keywords and all kind of variables. That is, within the same scope, an identifier can be bound as a label and as a variable or keyword, and local bindings of either kind do not shadow other bindings of the other kind. There are two (disjoint) types of labels, static labels and dynamic labels. Static labels are bound to CFG terms, and dynamic labels are bound to CFG locations.

(execute procedure expression [formals cfg term] …)

When an execute CFG term is evaluated in an environment and within a pending set, the cfg terms are evaluated to CFG blocks in no particular order in the environment, each within an empty pending set. The pending set in which the execute CFG term is being evaluated is then materialized with a new CFG block as its tail and the materialization is returned.

When the CFG block is later called with a CFG state, the environment of the surrounding cfg expression is extended by binding each loop variable in whose scope the execute CFG term occurs to a fresh location holding the value to which it is bound in the CFG state. The procedure expression is then evaluated in this extended environment to yield a procedure value. This procedure is then tail-called with as many procedure arguments as there are cfg terms. The procedure should call one of the arguments exactly once and the call should be a tail call. In the continuation of this tail call, the CFG state is then extended by binding the corresponding formals to the return values of the call, and the CFG block resulting from the evaluation of the corresponding cfg term is then tail-called with the extended CFG state and the resulting CFG state is returned.

The CFG block has the CFG blocks resulting from evaluation of the cfg terms as successors.

The variables of each formals are introduced as loop variables in the corresponding CFG block.

(halt)

When a (halt) CFG term is evaluated, the pending set is materialized with a new CFG block as its tail, and the materialization is returned.

When the CFG block is called with a CFG state, it simply returns an empty CFG state.

The CFG block has no successors.

(labels ([dynamic label cfg term] …) body cfg term)

When a labels CFG term is evaluated in an environment within a pending set, the environment is extended by binding each label to a CFG location initially holding an invalid CFG block. The cfg terms and the body cfg term are then evaluated in an unspecified order in this extended environment. In this environment, the cfg terms are evaluated within an empty pending set and the body cfg term is evaluated within the pending set in which the labels CFG term is being evaluated. For each label, the CFG block resulting from evaluating the corresponding cfg term is then stored in the corresponding CFG location. Finally, the CFG block resulting from evaluation of the body cfg term is returned.

When the CFG block returned by the labels CFG term is later called with a CFG state, the CFG block resulting from evaluating the body cfg term is tail-called with the CFG state and the resulting CFG state is returned.

(label* ([static label cfg term]) body cfg term)

When a label* CFG term is evaluated in an environment within a pending set, the environment is extended by binding the label to the (unevaluated) cfg term. The body cfg term is then evaluated in this environment within the pending set. The result of this evaluation is returned.

When the CFG block returned by the labels CFG term is later called with a CFG state, the CFG block resulting from evaluating the body cfg term is called with the CFG state, and the resulting CFG state is returned.

It is a syntax or undefined violation if evaluating cfg term in the non-extended environment within the empty pending set would raise an exception of type &syntax or &undefined, respectively.

Note: The label* syntax will be extended below.

(call static label)

When the call CFG term is evaluated in an environment and within a pending set, the CFG term to which the static label is bound is evaluated in the same environment and within the same pending set, and the result of this evaluation is returned.

It is an undefined violation if the static label is not bound in the environment in which the call CFG term is evaluated.

(call dynamic label)

When the call CFG term is evaluated in an environment within a pending set, the location to which the dynamic label is bound is looked up and remembered. The pending set is then materialized with a new CFG block as its tail, and the materialization is returned.

When the CFG block returned by the call CFG term is called with a CFG state, the CFG block stored in the remembered location is tail-called with the CFG state, and the resulting CFG is returned.

The CFG block returned by the call CFG term has the CFG block eventually stored in the remembered location as its successor.

It is an undefined violation if the dynamic label is not bound in the environment in which the call CFG term is evaluated.

(finally cfg formals expression cfg term)

When a finally CFG term is evaluated within a pending set, the pending set is materialized with a new CFG block as its tail, and the materialization is returned.

When the CFG block is later called with a CFG state, the environment of the surrounding cfg expression is extended by binding each loop variable in whose scope the CFG block occurs to a fresh location holding the value to which it is bound in the CFG state. Then, the CFG block resulting from evaluation of the cfg term is called. The extended environment is then further extended by binding each return variable in whose scope the CFG block returned by the finally CFG term occurs to a fresh location holding the value to which it is bound in the returned CFG state. Then, the expression is evaluated in the doubly extended environment, the CFG state received from call the cfg term is extended by binding the formals to the values receiving from evaluating the expression, and the extended CFG state is then returned.

The CFG block returned by the finally CFG term has the CFG block resulting from the evaluation of the cfg term as its successor.

(permute dynamic label cfg term body cfg term)

When a permute CFG term is evaluated in an environment within a pending set, the environment is extended by binding the dynamic label to a fresh CFG location initially holding an invalid CFG block. The cfg term is then evaluated in this extended environment within an empty pending set to a CFG block. The pending set within which the permute CFG term is being evaluated is then extended by adjoining a pair consisting of the fresh location and the resulting CFG block. Finally, the body cfg term is evaluated in the original environment but within the extended pending list, and the resulting CFG block is returned.

Note: The permute syntax will be extended below.

Tail contexts

If a cfg expression is in tail context, the result expression is in tail context as well.

Non-determinism

Through permute CFG terms, non-trivial pending sets are generated. Through the non-determinism of materializing pending sets, the CFGs resulting from evaluating CFG terms are non-deterministic as well. For such non-deterministic CFGs (which can be viewed as a set of CFG blocks together with a set of possible predecessor relations on these CFG blocks), the scope of loop or return variable is defined as the intersection of the scopes of the loop or return variable over all possible materializations.

Derived CFG terms

The following entries describe CFG terms that can be converted into primitive CFG terms.

(label* ([static label1 cfg term1] …) body cfg term)

Effectively equivalent to body cfg term when there is no static label. Otherwise, effectively equivalent to (label* ([static label1 cfg term1]) (label* ([static label2 cfg term2] …) body cfg term)).

Note: This cfg syntax extends the primitive label* syntax to more than one static label.

(permute ([dynamic label1 cfg term1] …) body cfg term)

Effectively equivalent to body cfg term when there is no dynamic label. Otherwise, effectively equivalent to (permute ([dynamic label1 cfg term1]) (permute ([dynamic label2 cfg term2] …) body cfg term)).

Note: This cfg syntax extends the primitive permute syntax to more than one dynamic label.

(bind ([formals expression] ...) cfg term)

When an bind CFG term is evaluated in an environment and within a pending set, the cfg term is evaluated to CFG block in the environment within an empty pending set. The pending set in which the execute CFG term is being evaluated is then materialized with a new CFG block as its tail, and the materialization is returned.

When the CFG block is later called with a CFG state, the environment of the surrounding cfg expression is extended by binding each loop variable in whose scope the bind CFG term occurs to a fresh location holding the value to which it is bound in the CFG state. The expressions are then evaluated in an unspecified order in this extended environment to yield values for each expression. The CFG state is then extended by binding each formals as loop variables to the return values of the corresponding evaluation, and the CFG block resulting from the evaluation of cfg term is then tail-called with the extended CFG state, and the resulting CFG state is returned.

It is a syntax violation if the loop variables in all formals are not pairwise different.

CFG syntax and label definitions

The Scheme syntax described in this section are definitions that may appear anywhere other definitions may appear.

Keyword bindings established by these definitions are visible throughout the body in which they appear, except where shadowed by other bindings, and nowhere else, just like variable bindings established by define. All bindings established by a set of definitions are visible within the definitions themselves.

(cfg keyword datum …)
(cfg keyword datum … . datum)
cfg keyword

At the start of the evaluation of a cfg expression, these CFG macro uses are expanded by the syntax expander into core CFG terms just as Scheme macro uses are expanded into core forms. In particular, a CFG transformer is like a Scheme transformer.

(define-cfg-syntax cfg keyword transformer expression)

Binds cfg keyword to the value transformer expression, which must evaluate, at macro-expansion time, to a CFG transformer.

(define-cfg-syntax* cfg keyword transformer expression)

As define-cfg-syntax except that cfg keyword must be bound. Define-cfg-syntax* does not change the meaning of keyword outside cfg terms.

(define-cfg-label identifier)

Binds the identifier to a fresh label.

Whenever the identifier appears as a static label or dynamic label, it is replaced by the label.

(define-cfg-label* identifier)

As define-cfg-label except that identifier must be bound. Define-cfg-label* does not change the meaning of identifier outside cfg terms.

The following example defines a CFG keyword that can be used like bind to unconditionally bind loop variables:

(define-cfg-syntax simple-bind
  (lambda (stx)
    (syntax-case stx ()
      [(_ ([id init] ...) cfg)
       (for-all identifier? #'(id ...))
       #'(execute (lambda (e)
                    (e init ...))
           [(id ...) cfg])])))

(cfg (simple-bind ([x 1] [y 2])
       (finally (res) (+ x y) (halt)))
  res)3

A probably more useful CFG macro is the following one:

(define-cfg-syntax return
  (lambda (stx)
    (syntax-case stx ()
      [(_ return-var ...)
       (for-all identifier? #'(return-var ...))
       #'(finally (return-var ...) (values return-var ...) (halt))])))

(cfg (simple-bind ([x 1]) (return x))
  x)1

The label definition feature is demonstrated in the following example:

(define-syntax permuting
  (lambda (stx)
    (syntax-case stx ()
      [(_ cfg-term ... result-expr)
       #'(cfg (permute ([p cfg-term] ...)
		(finally (res) result-expr (halt)))
	   res)])))

(permuting (simple-bind ([x 99]) (call p)) x)99

Implementation

The sample implementation is a portable R6RS implementation written from scratch using SRFI 213 The sample implementation in the git repo is configured for Chez Scheme.

Git repo for the sample implementation.

Acknowledgements

This SRFI would not exist if there hadn't been Olin Shiver's paper The Anatomy of a Loop. In fact, almost all of the mental effort needed for this SRFI was already provided by him. This does not imply that he does or does not endorse this SRFI.

It was Jens Axel Søgaard who reminded me of Olin Shiver's paper, which I had once read but then forgotten about.

During the draft period, Wolfgang Corcoran-Mathe read the specification thoroughly and found a number of small bugs and inconsistencies, which could then be fixed.

© 2022 Marc Nieper-Wißkirchen.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


Editor: Arthur A. Gleckler