Title

Generators

Author

Shiro Kawai, John Cowan, Thomas Gilray

This SRFI is currently in ``draft'' status. To see an explanation of each status that a SRFI can hold, see here. To provide input on this SRFI, please mail to <srfi minus 121 at srfi dot schemers dot org>. See instructions here to subscribe to the list. You can access previous messages via the archive of the mailing list.

Abstract

This proposal defines utility procedures that create, transform, and consume generators. A generator is simply a procedure with no arguments that works as a source of a series of values. Every time it is called, it yields a value. Generators may be finite or infinite; all finite generators returns an end-of-file object to indicate that it is exhausted. For example, read-char is a generator that generates characters from the current input port.

Issues

None at present.

Rationale

The generators of this proposal don't belong to a disjoint type; they are just procedures that conform to a calling convention, so you can construct a generator with lambda; the constructors of this proposal are provided for convenience. Any procedure that can be called with no arguments can serve as a generator.

Generators are very lightweight and are useful for implementing simple on-demand calculations. However, it's important to keep in mind that calling a generator is a side-effecting construct; you can't safely backtrack, for example. Persistent lazy sequences based on generators will be the subject of a future SRFI.

Using an end-of-file object to indicate that there is no more input makes it impossible to include such an object in the stream of generated values. However, it is compatible with the existing design of input ports, and it makes for more compact code than returning a user-specified termination object (as in Common Lisp), throwing an exception (as in Python), or returning multiple values. (Note that some generators are infinite in length, and never return an end-of-file object.)

These procedures are drawn from the Gauche core and the Gauche modules gauche.generator and data.random, with some renaming to make them more systematic, and with some additions from the Python library itertools. Consequently, Shiro Kawai, the author of Gauche and its specifications, is listed as first author of this SRFI. John Cowan served as editor and shepherd. Thomas Gilray provided the sample implementation and a valuable critique of the proposal.

Specification

Generators can be divided into two classes, finite and infinite. Both kinds of generators can be invoked an indefinite number of times. After a finite generator has generated all its values, it will return an end-of-file object for all subsequent calls.

Generator constructors

The result of a generator constructor is just a procedure, so printing it doesn't show much. In the examples in this section we use generator->list to convert the generator to a list.

These procedures have names ending with generator.

make-generator arg …

The simplest finite generator. Generates each of its arguments in turn. When no arguments are provided, it returns a null generator that generates no values.

make-repeating-generator arg …

Returns an infinite generator that repeats the given arguments forever.

 
(generator->list (make-repeating-generator 1 2 3) 10)
  ⇒ (1 2 3 1 2 3 1 2 3 1)

Note that the above example limits the length of the converted list to 10; otherwise generator->list won't return.

make-iota-generator [ count [ start [ step ] ] ]

Creates a generator of a series of count numbers, starting from start (0 by default) and increased by step (1 by default). If no arguments are provided, the generator is infinite.

 
(generator->list (make-iota-generator 10 3 2))
  ⇒ (3 5 7 9 11 13 15 17 19 21)

If both start and step are exact, the generator yields exact numbers; otherwise it yields inexact numbers. It is an error unless count is an exact integer.

 
(generator->list (make-iota-generator +inf.0 1/2 1/3) 6)
  ⇒ (1/2 5/6 7/6 3/2 11/6 13/6)
(generator->list (make-iota-generator +inf.0 1.0 2.0) 5)
  ⇒ (1.0 3.0 5.0 7.0 9.0)
make-range-generator start end [ step ]

Creates a finite generator of a sequence of numbers. The sequence begins with start, increases by step (default 1), and continues while the number is less than end.

 
(generator->list (make-range-generator 3 8))
  ⇒ (3 4 5 6 7)
make-tabulation-generator [ k ] proc

Creates a generator of a series of k numbers (by default, an infinite number), namely (proc 0), (proc 1), (proc 2), ....

make-coroutine-generator proc

Creates a generator from a coroutine.

The proc argument is a procedure that takes one argument, yield. When called, make-coroutine-generator immediately returns a generator g. When g is called, proc runs until it calls yield. Calling yield causes the execution of proc to be suspended, and g returns the value passed to yield.

Whether this generator is finite or infinite depends on the behavior of proc. If proc returns, it is the end of the sequence — g returns an end-of-file object from then on. The return value of proc is ignored.

The following code creates a generator that produces a series 0, 1, and 2 (effectively the same as (make-iota-generator 3) and binds it to g.

 
(define g
  (generate
   (lambda (yield) (let loop ((i 0))
               (when (< i 3) (yield i) (loop (+ i 1)))))))

(generator->list g) ⇒ (0 1 2)
list->generator lis
list->circular-generator lis
vector->generator vec [ start [ end ] ]
reverse-vector->generator vec [ start [ end ] ]
string->generator str [ start [ end ] ]
bytevector->generator str [ start [ end ] ]

These procedures return generators that yield each item in the given argument. Mutating the underlying object will affect the results of the generator.

 
(generator->list (list->generator '(1 2 3 4 5)))
  ⇒ (1 2 3 4 5)
(generator->list (list->circular-generator '(1 2 3)) 5)
  ⇒ (1 2 3 1 2)
(generator->list (vector->generator '#(1 2 3 4 5)))
  ⇒ (1 2 3 4 5)
(generator->list (reverse-vector->generator '#(1 2 3 4 5)))
  ⇒ (5 4 3 2 1)
(generator->list (string->generator "abcde"))
  ⇒ (#\a #\b #\c #\d #\e)

By default the generator is exhausted once all items are retrieved; the optional start and end arguments can limit the range the generator walks across.

For list->circular-generator, the elements of the list are generated repeatedly, giving an infinite generator.

For reverse-vector->generator, the first value is the item right before the end element, and the last value is the start element. For all the others, the first value the generator yields is the start-th element, and it ends right before the end-th element.

 
(generator->list (make-vector-generator '#(a b c d e) 2))
  ⇒ (c d e)
(generator->list (make-vector-generator '#(a b c d e) 2 4))
  ⇒ (c d)
(generator->list (make-reverse-vector-generator '#(a b c d e) 2))
  ⇒ (e d c b)
(generator->list (make-reverse-vector-generator '#(a b c d e) 2 4))
  ⇒ (d c)
(generator->list (make-reverse-vector-generator '#(a b c d e) #f 2))
  ⇒ (b a)
make-bits-generator n

This procedure takes an exact integer considered as a twos-complement value and treats it as a sequence of boolean values (0 for false and 1 for true), returning bits from the least significant bit first. Note that the infinite sequence of high-order 0 or 1 bits are not returned, so this is a finite generator. In particular, (make-bits-generator 0) and (make-bits-generator -1) are both null generators.

 
(generator->list (bits->generator #b10110))
 ⇒ (#f #t #t #f #t)
make-port-generator input-port reader
make-port-sexp-generator input-port
make-port-line-generator input-port
make-port-char-generator input-port
make-port-byte-generator input-port

Returns generators that read the appropriate type of object from the given port using reader, read, read-line, read-char, or read-u8 respectively. Whether these are finite or infinite generators depends on what input-port is connected to: a file will create a finite generator, whereas a device or socket may create an infinite generator.

make-for-each-generator for-each obj

A generic generator that converts any collection obj to a generator that walks through the obj using the for-each procedure appropriate for obj. This must be a procedure that when called as (for-each proc obj) calls proc on each element of obj. Examples of such procedures are for-each, string-for-each, and vector-for-each from R7RS. The value returned by for-each is ignored. collection is finite, which would typically be the case.

make-unfold-generator stop? mapper successor seed

A generator constructor similar to unfold.

The stop? predicate takes a seed value and determines whether to stop. The mapper procedure calculates a value to be returned by the generator from a seed value. The successor procedure calculates the next seed value from the current seed value.

For each call of the resulting generator, stop? is called with the current seed value. If it returns true, then the generator returns an end-of-file object. Otherwise, it applies mapper to the current seed value to get the value to return, and uses successor to update the seed value.

This generator is finite unless stop? never returns true.

 
(generator->list (make-unfold-generator
                      (lambda s (> s 5))
                      (lambda s (* s 2))
                      (lambda s (+ s 1))
                      0))
  ⇒ '(0 2 4 6 8 10)

Generator operations

The following procedures accept generators and return a new generator. In general, the result will be a finite generator if the arguments are.

The names of these procedures are prefixed with g.

gcons* item … gen

Returns a generator that adds items in front of gen. Once the items have been consumed, the generator is guaranteed to tail-call gen.

 
(generator->list (gcons* 'a 'b (make-iota-generator 2)))
 ⇒ (a b 0 1)
gappend gen …

Returns a generator that yields the items from the first given generator, and once it is exhausted, use the second generator, and so on.

 
(generator->list (gappend (make-iota-generator 3) (make-iota-generator 2)))
 ⇒ (0 1 2 0 1)

(generator->list (gappend))
 ⇒ ()
ginterleave gen …

Returns a generator that first yields an item from the first source generator, then from the second generator, and so on, returning to the first generator when the last generator has yielded an item. When any source generator is exhausted, the result generator is exhausted.`

 
(generator->list (ginterleave (make-iota-generator 3) (make-iota-generator 2)))
 ⇒ (0 0 1 1)

(generator->list (gappend))
 ⇒ ()
gconcatenate gen

The gen argument is a generator of generators. Returns a generator that yields all the results from the first generator, then all the results from the second one, then the third, etc.

It is similar to (apply gappend (generator->list gen)), except that gconcatenate can work even if gen generates an infinite number of generators. Calls to the returned generator are guaranteed to tail-call one of the generators provided by gen until they are exhausted.

 
(generator->list
  (gconcatenate
    (make-element-generator (make-iota-generator 3) (make-iota-generator 2))))
 ⇒ (0 1 2 0 1)
gmerge comparator gen1 gen2
gunion comparator gen1 gen2
gintersection comparator gen1 gen2

Creates a generator that yields elements out of input generators, in strictly increasing order as determined by a SRFI 114 comparator. The results differ as follows: gmerge returns all the elements; gunion returns all the elements that are distinct in the sense of the comparator; gintersection returns the elements that appear in all of the generators. Elements generated by the result are drawn from the first generator in which they appear.

Each input generator must yield appropriately ordered and distinct elements by itself; otherwise the result won't be ordered.

If only one generator is given, it is just returned. In that case, comparator is ignored.

 
(generator->list (gmerge < '(1 3 8) '(5) '(2 4)))
  ⇒ '(1 2 3 4 5 8)
gmap proc gen gen2

Returns a generator that yields a value returned by the application of proc to the values returned by the generator arguments. The returned generator terminates when any of the generator arguments are exhausted.

Note: This differs from generator-collect, which consumes all values at once and returns the results as a list, while gmap returns a generator without immediately consuming input.

gfold proc seed gen gen2

A generator analogue of fold for mapping with state. It yields a sequence of sub-folds over proc.

The proc argument is a procedure that takes as many arguments as the input generators plus one. It is called as (proc v1 v2 … seed), where v1, v2, … are the values yielded from the input generators, and seed is the current seed value. It must return two values, the yielding value and the next seed.

Note: This differs from generator-fold, which consumes all values while folding over them, while gfold returns a generator immediately without consuming input.

gfilter pred gen
gremove pred gen

Return generators that yield the items from the source generator, except those on which pred answers false or true respectively.

gfilter-map proc gen gen2

Works the same as (gfilter values (gmap proc gen gen2 …)), only slightly more efficiently.

gstate-filter proc seed gen

This allows stateful filtering of a series. The proc argument must be a procedure that takes a value v from the source generator and a seed value. It should return two values, a boolean flag and the next seed value. If the boolean is true, the generator yields v. Otherwise, the generator keeps calling proc, updating the seed value, until the flag is true or the source generator is exhausted.

The following example takes a generator of oscillating values and yields only values that are greater than their previous value.

 
(generator->list
 (gstate-filter (lambda (v s) (values (< s v) v)) 0
                (make-element-generator 1 2 3 2 1 0 1 2 3 2 1 0 1 2 3)))
 ⇒ (1 2 3 1 2 3 1 2 3)
gbuffer-filter proc seed gen

This procedure allows n-to-m mapping between elements in input and output — that is, it can use several input elements to generate one or more output elements.

The procedure proc receives the next input element and an accumulated seed value. It returns two values: A list of output values, and the next seed value. If you need to look at more input to generate output, you can return an empty list as the first value.

If the input reaches the end, the output ends when the output of the last call to proc is exhausted (the final seed value is discarded).

gtake gen k [ padding ]
gdrop gen k

The generator analogues of take and drop. Gtake returns a generator that yields (at most) the first k items of the source generator, while gdrop returns a generator that skips the first k items of the source generator.

These won't complain if the source generator is exhausted before generating k items. By default, the generator returned by gtake terminates when the source generator does, but if you provide the padding argument, then the returned generator will yield k items, using the padding value to provide sufficient additional values.

gtake-while pred gen
gdrop-while pred gen

The generator analogues of take-while and drop-while. The generator returned from gtake-while yields items from the source generator as long as pred returns true for each. The generator returned from gdrop-while first reads and discards values from the source generator while pred returns true for them, then starts yielding items returned by the source.

gpairs car-gen cdr-gen

Returns a generator that yields pairs whose car is the result of calling car-gen and whose cdr is the result of calling cdr-gen. The output ends when either generator is exhausted.

 
(define g
  (gpairs (make-generator 113 101 12 68 -55)
          (make-generator #t #f #t #f #f))

(generator->list g 10)
 ⇒ ((113 . #t) (101 . #f) (12 . #t) (68 . #f) (-55 . #f))
gtuple gen …

Returns a generator that yields lists whose i-th element is generated from the i-th argument.

 
(define g
  (gtuples (make-generator -43 53 -114)
           (make-generator #f #f #f)
           (make-generator #\8 #\1 #\i))

(generator->list g 3)
 ⇒ ((-43 #f #\8) (53 #f #\1) (-114 #f #\i))

This procedure is analogous to zip, but gzip would be a misleading name.

glists sizer item-gen [ padding ]
gvectors sizer item-gen [ padding ]
gstrings sizer item-gen [ padding ]

Creates a generator that generates lists, vectors or strings of values from item-gen, respectively. The size of each datum is determined by sizer, which can be either a non-negative integer or a generator of non-negative integers. The value of the sizer determines the length of the result data.

The padding argument controls how the end of input is handled, just like gtake. When padding is not provided, the last item from the output generator may not have k items if the input is too short to fill them, as shown in the above example. If padding is present and the input is too short to complete k items, padding is used to fill the rest.

 
(generator->list (glists 3 (make-iota-generator 7)))
  ⇒ ((0 1 2) (3 4 5) (6))
 
(generator->list (glists 3 (make-iota-generator 6) 'x))
  ⇒ ((0 1 2) (3 4 5))
(generator->list (glists 3 (make-iota-generator 7) 'x))
  ⇒ ((0 1 2) (3 4 5) (6 x x))
gdelete item gen [ = ]

Creates a generator that returns whatever gen returns, except for any items that are the same as item in the sense of =, which defaults to equal?.

 
(generator->list (gdelete 3 (make-iota-generator 6)))
  ⇒ (0 1 2 4 5)
gdelete-neighbor-dups gen [ pred ]

Creates a generator that returns whatever gen returns, except for any items that are equal to the preceding item in the sense of pred, which defaults to equal?.

 
(generator->list (gdelete-neighbor-dups (list->generator '(a a b c c c))))
  ⇒ (a b c)
gindex value-gen index-gen

Creates a generator that returns elements of value-gen specified by the indices (non-negative exact integers) generated by index-gen. It is an error if the indices are not strictly increasing.

 
(generator->list (gindex (list->generator '(a b c d e f))
                         (list->generator '(0 2 4))))
  ⇒ (a c e)
gselect value-gen truth-gen

Creates a generator that returns elements of value-gen that correspond to the values generated by index-gen. If the current value of truth-gen is true, the current value of value-gen is generated, but otherwise not.

 
(generator->list (gselect (list->generator '(a b c d e f))
                          (list->generator '(#t #f #f #t #t #f))))
  ⇒ (a d e)
gnth-value gen n

Creates a generator that returns every nth element of gen

 
(generator->list (gnth-value (list->generator '(a b c d e f)) 2))
  ⇒ (a c e)
ggroup-by gen proc pred

Creates a generator that returns lists made from the elements of gen that return the same result (in the sense of pred) when applied pairwise and consecutively to them. Proc is applied to each value before pred is called. The car of each list is the result returned by proc, and the cdr is a list of the corresponding generated values in order.

 
(generator->list (ggroup-by (list->generator '(0 1 -1 2 2 0)) abs =))
  ⇒ ((0 0) (1 1 -1) (2 2 2) (0 0))

Consuming generated values

Unless otherwise noted, these procedures consume all the values of the generator passed to them. They have names prefixed with generator-.

generator->list generator [ k ]

Reads items from generator and returns a list of them. By default, it reads until the generator is exhausted. If an optional argument k is given, it must be a non-negative integer, and the list ends when either k items are consumed, or the generator is exhausted.

Be careful not to pass an infinite generator to this procedure without specifying k, or this procedure will not return.

generator->reverse-list generator [ k ]

Reads items from generator and returns a list of them in reverse order. By default, this reads until the generator is exhausted. If an optional argument k is given, it must be a non-negative integer, and the list ends when either k items are read, or the generator is exhausted.

Be careful not to pass an infinite generator to this procedure without specifying k, or this procedure will not return.

generator-fold proc seed gen1 gen2

Works like fold on the values generated by the generator arguments.

When one generator is given, for each value v generated by gen, proc is called as (proc v r), where r is the current accumulated result; the initial value of the accumulated result is seed, and the return value from proc becomes the next accumulated result. When gen returns an end-of-file object, the accumulated result at that time is returned from generator-fold.

When more than one generator is given, proc is invoked on the values returned by all the generator arguments followed by the current accumulated result. The iteration terminates when any of the generators returns an end-of-file object.

 
(with-input-from-string "a b c d e"
  (lambda () (generator-fold cons 'z read)))
  ⇒ (e d c b a . z)
generator-for-each proc gen gen2

A generator analogue of for-each that consumes generated values using side effects. Repeatedly applies proc on the values yielded by gen, gen2 … until any one of the generators yields an end-of-file object. The values returned from proc are discarded.

generator-collect proc gen gen2

A generator analogue of map. Repeatedly applies proc on the values yielded by gen, gen2 … until any one of the generators yields an end-of-file object. The values returned from proc are collected into a list and returned.

 
(with-input-from-string "a b c d e"
  ((lambda () (generator-map symbol->string read)))
  ⇒ ("a" "b" "c" "d" "e")

The same effects can be achieved by combining generator->list and gmap.

 
(generator->list (gmap proc gen gen2 …))
generator-nth gen n

Returns the nth item of the generator.

generator-last gen

Returns the last item of the generator.

generator-find pred gen

Returns the first item from the generator gen that satisfies the predicate pred, or #f if there is no such item before an end-of-file object is generated.

generator-length gen

Returns the number of items available from the generator gen.

generator-count pred gen

Returns the number of items available from the generator gen that satisfy the predicate pred.

generator-any pred gen

Returns #t if any of the values available from the generator gen satisfy the predicate pred. After such a value is found, the procedure returns without consuming the rest of the generator.

generator-every pred gen

Returns #f if any of the values available from the generator gen do not satisfy the predicate pred. After such a value is found, the procedure returns without consuming the rest of the generator.

generator=? pred gen ...

Returns #f if the corresponding values available from the generators gen are not equal in the sense of the predicate pred. After such a value is found, the procedure returns without consuming the rest of the generators. If all the values of all the generators have been consumed without finding any values that are not equal, generator=? returns #t.

generator-unfold gen unfold arg ...

Equivalent to (unfold eof-object? (lambda (x) x) gen (gen) args ...). The values of gen are unfolded into the collection that unfold creates. The combination of make-for-each-generator and generator-unfold makes it possible to convert any collection that has a for-each procedure into any collection that has an unfold constructor.

The signature of the unfold procedure is (unfold stop? mapper successor seed args ...). Note that vector-unfold and vector-unfold-right do not have this signature and cannot be used with this procedure.

Syntax

generator-let* (binding …) body body2 …

This captures a monadic pattern that frequently appears in generator-related code. It is similar in spirit to and-let* from SRFI 2, but returns as soon as an evaluated expression returns an end-of-file object.

The binding part can be either (var expr) or ( expr ). The actual definition will explain this syntax clearly.

 
(define-syntax generator-let*
  (syntax-rules ()
    ((_ () body body2 ...) (begin body body2 ...))
    ((_ ((var gen-expr) more-bindings ...) . body)
     (let ((var gen-expr))
       (if (eof-object? var)
         var
         (generator-let* (more-bindings ...) . body))))
    ((_ (( gen-expr ) more-bindings ...) . body)
     (let ((var gen-expr))
       (if (eof-object? var)
         var
         (generator-let* (more-bindings ...) . body))))))
do-generator (var gexpr) body …

Gexpr is an expression that yields a generator. It is evaluated once. The resulting generator is called repeatedly until it returns an end-of-file object. Every time the generator is called, the body expressions are evaluated in a scope where var is bound to the value returned from the generator.

This macro exists for performing side-effects. You can do the same thing with generator-for-each, but sometimes this macro makes the imperative code more readable:

 
(do-generator (line read-line)
  ;; do some side-effecting stuff with line
  )
greceive formals gexpr body …

Gexpr is an expression that yields a generator. It is evaluated once. Formals and body are as described in R7RS. Specifically, formals can have any of three forms:

In any case, the expressions in body are evaluated sequentially in the extended environment. The results of the last expression in the body are the values of greceive.

Implementation

The sample implementation contains the following files:

Certain procedures are not yet implemented.

Copyright

Copyright (C) Shiro Kawai, John Cowan, Thomas Gilray (2015). All Rights Reserved.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


Editor: Mike Sperber
Last modified: Wed Jun 10 08:57:14 MST 2015