196: Range Objects

by John Cowan (text), Wolfgang Corcoran-Mathe (implementation)

Status

This SRFI is currently in final status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-196@nospamsrfi.schemers.org. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.

Abstract

Ranges are collections somewhat similar to vectors, except that they are immutable and have algorithmic representations instead of the uniform per-element data structure of vectors. The storage required is usually less than the size of the same collection stored in a vector and the time needed to reference a particular element is typically less for a range than for the same collection stored in a list. This SRFI defines a large subset of the sequence operations defined on lists, vectors, strings, and other collections. If necessary, a range can be converted to a list, vector, or string of its elements or a generator that will lazily produce each element in the range.

Rationale

One of the most common things to do in programming is to execute a block of code a fixed number of times, with access to the index used for the iteration. Indeed, it is so common that there is generally a standard syntax for providing it, generally involving the keyword for (or if not, as in Fortran and Lisp, the word do). It is also usual to be able to provide a lower bound, (generally defaulting to 0 or 1) as well as a step (generally defaulting to 1) which allows iterations through a sequence of odd numbers, or multiples of 10, or whatever.

Languages with higher order functions, however, provide a second approach to such loops: construct a sequence of numbers and apply a function to each element of the sequence. SRFI 1's iota and the standard for-each procedures make this easy: (for-each (lambda (i) ...) (iota 0 10)) will execute the expressions represented as ... with i bound to the numbers 0 through 9, as the generated list includes the lower bound and excludes the upper bound.

This approach is less feasible as the number of numbers grows. To iterate a million times involves constructing a list of a million numbers to be iterated over and immediately discarded as garbage. This is not something you want to do in the inner loop of your code. The range objects of this SRFI represent such sequences using (as a rule) a small fixed amount of storage. Using (range-for-each (lambda (i) ...) (numeric-range 0 1000000)) iterates a million times but with less space overhead than iota's list of ten elements. They can be thought of as compactly stored vectors.

In addition, there are other sequences besides integers from which a range can be drawn. In particular, inexact numbers can also specify ranges: (numeric-range 0.0 1.0 0.1) specifies the sequence 0.0, 0.1, ... 0.9, at least when inexact numbers are represented as IEEE 754 binary double floats (as is usually the case). Roundoff error is still possible when multiplying, but it is greatly reduced compared to accumulated error by repeated adding.

The rationale for string-range is to provide efficient random access to strings. There have been many attempts to ensure O(1) reference to string characters, such as string cursors, UTF-32 encoding, SRFI 135 texts, immutable strings, and so on. Because the range-ref procedure for a string created through string-range runs in O(1) time in the length of the string, a range created by string-range can efficiently access arbitrary characters of the range.

Specification

Ranges belong to a disjoint type.

Ranges provide certain running time guarantees. With each range, we associate two lengths of time, the average accessing time and the total accessing time. The total accessing time is the average accessing time multiplied by the length of the range. In the runtime guarantees below, the time spent in arguments designated by pred, equal, or proc is ignored.

Unless otherwise noted, procedures in this SRFI that return ranges allocate at most O(1) new locations (see R[57]RS section 3.4 for further information). Such ranges are known as compact ranges. Procedures marked as returning expanded ranges allocate at most O(n) locations, where n is the number of elements in the range.

This SRFI recommends, but does not require, that Scheme implementations which also provide SRFI 42 modify it so that the typed generator :range also accepts a single argument which is a range in the sense of this SRFI. This feature should be used with caution, as SRFI 42 loops expect that :range iterates only over exact rationals.

The following names are used for arguments to procedures:

obj: Any Scheme object.

range: A range object.

pred: A predicate that accepts zero or more arguments.

equal: A predicate that accepts two arguments and returns a boolean value. It is not necessarily an equivalence relation.

length: An exact positive integer.

proc: A procedure that accepts zero or more arguments and returns zero or more values.

list: A Scheme list.

vector: A Scheme vector.

string: A Scheme string.

It is an error (unless otherwise noted) if the procedures are passed arguments that do not have the type implied by the argument names.

Constructors

(range length indexer)

Returns a range whose length (number of elements) is length. The indexer procedure returns the nth element (where 0 ≤ n < length) of the range, given n. This procedure must run in O(1) time. The range returned is compact, although indexer may close over arbitrarily large data structures. The average accessing time of the resulting range is the average time needed to run indexer.

Examples:

(range->list (range 26 (lambda (n) (integer->char (+ 65 n)))))
  ⇒ (#\A #\B #\C #\D #\E … #\Z)

(range->list (range 10 (lambda (n) (expt 1/2 n))))
  ⇒ (1 1/2 1/4 … 1/512)

(numeric-range start end [step])

Returns a numeric range, a special case of a range specified by an inclusive lower bound start, an exclusive upper bound end, and a step value (default 1), all of which can be exact or inexact real numbers. This constructor produces the sequence

start, (+ start step), (+ start (* 2 step)), …, (+ start (* n step)),

where n is the greatest integer such that (+ start (* n step)) < end if step is positive, or such that (+ start (* n step)) > end if step is negative. It is is an error if an n satisfying this condition cannot be determined, or if step is numerically zero. This procedure must run in O(1) time. The average accessing time of the resulting range must be O(1).

Note that an effect of this definition is that the elements of a range over inexact numbers are enumerated by multiplying the index by the step value rather than by adding the step value to itself repeatedly. This reduces the likelihood of roundoff errors.

(range->list (numeric-range 0 1 1/10))
  ⇒ (0 1/10 1/5 3/10 2/5 1/2 3/5 7/10 4/5 9/10)
(range->list (numeric-range 5 -5 -3)) ⇒ (5 2 -1 -4)

(iota-range length [start [step]])

Returns an iota-numeric range, a special case of a range specified by a length (a non-negative exact integer) as well as an inclusive lower bound start (default 0) and a step value (default 1), both of which can be exact or inexact real numbers. This constructor produces the sequence

start, (+ start step), (+ start (* 2 step)), …, (+ start (* (- length 1) step)),

This procedure must run in O(1) time. The average accessing time of the resulting range must be O(1).

Note that an effect of this definition is that the elements of a range over inexact numbers are enumerated by multiplying the index by the step value rather than by adding the step value to itself repeatedly. This reduces the likelihood of roundoff errors.

(vector-range vector)

Returns a range whose elements are those of vector. The procedure must run in O(1) time. The average accessing time of the resulting range must be O(1). It is an error to mutate vector.

(range->list (vector-range #(1 3 5 7 9)))
  ⇒ (1 3 5 7 9)

(string-range string)

Returns a range whose elements are those of string. It is an error to mutate string. This procedure must run in O(n) time, where n is the length of string. The average accessing time of the resulting range must be O(1).

In a Scheme that guarantees O(1) random access to strings, range-ref on a range created by string-range can simply call string-ref, and the resulting range is compact. But if only O(n) access is available, this procedure may have to copy the string's characters into a vector, resulting in an expanded range.

(range->list (string-range "abcde"))
  ⇒ (#\a #\b #\c #\d #\e)

(range-append range ...)

Returns a range whose elements are the elements of the ranges in order. This procedure must run in O(n) + O(k) time, where n is the total number of elements in all the ranges and k is the number of ranges. The result is usually expanded but may be compact. The average accessing time of the resulting range is asymptotically bounded by maximum of the average accessing times of the ranges.

(range->list (range-append (numeric-range 0 3)
                           (numeric-range 3 6)))
  ⇒ (0 1 2 3 4 5)

Predicates

(range? obj)

Returns #t if obj is a range and #f otherwise. This procedure must run in O(1) time.

(range=? equal range1 range2 ...)

Returns #t if all the ranges are of the same length and if their corresponding values are the same in the sense of equal, and #f otherwise. The runtime of this procedure is O(s) + O(k), where s is the sum of the total accessing times of the ranges and k is the number of ranges.

(range=? = (numeric-range 10 30) (numeric-range 10 30)) ⇒ #t

(range=? = (numeric-range 5 10) (numeric-range 6 11)) ⇒ #f

(range=? eqv? (numeric-range 0 0) (range 0 (lambda (i) i))) ⇒ #t

Accessors

(range-length range)

Returns the length (number of elements) of range. This procedure must run in O(1) time.

(range-length (numeric-range 10 30)) ⇒ 20

(range-ref range n)

Returns the nth element of range. It is an error if n is less than 0 or greater than or equal to the length of range. The running time of this procedure must be asymptotically equal to the average accessing time of range.

(range-ref (numeric-range 10 30) 5) ⇒ 15
(range-ref (range 2 (lambda (i) (not (zero? i)))) 1) ⇒ #t

(range-first range)

Equivalent (in running time as well) to (range-ref range 0).

(range-first (numeric-range 10 30)) ⇒ 10

(range-last range)

Equivalent (in running time as well) to (range-ref range (- (range-length range) 1)).

(range-last (numeric-range 10 30)) ⇒ 29

Iteration

(range-split-at range index)

Returns two values: (range-take range index) and (range-drop range index). It is an error if index is not an exact integer between 0 and the length of range, both inclusive. This procedure must run in O(1) time.

(let-values (((ra rb) (range-split-at (numeric-range 10 20) 5)))
  (values (range->list ra) (range->list rb)))
    ⇒ (10 11 12 13 14)
      (15 16 17 18 19)

(subrange range start end)

Returns a range which contains the elements of range from index start, inclusive, through index end, exclusive. This procedure must run in O(1) time. The average accessing time of the resulting range is asymptotically bounded by the average accessing time of range.

(range->list (subrange (numeric-range 5 15) 5 8))
  ⇒ (10 11 12)

(range->list (subrange (string-range "abcdefghij") 2 8))
  ⇒ (#\c #\d #\e #\f #\g #\h)

(range-segment range length)

Returns a list of ranges representing the consecutive subranges of length length. The last range is allowed to be shorter than length. The procedure must run in O(k) time, where k is the number of ranges returned. The average accessing time of the ranges is asymptotically bounded by the average accessing time of range.

(map range->list (range-segment (numeric-range 0 12) 4))
  ⇒ ((0 1 2 3) (4 5 6 7) (8 9 10 11))

(map range->list (range-segment (numeric-range 0 2 1/3) 4))
  ⇒ ((0 1/3 2/3 1) (4/3 5/3))

(range-take range count)
(range-take-right range count)

Returns a range which contains the first/last count elements of range. The average accessing time of the resulting ranges is asymptotically bounded by the average accessing time of range.

(range->list (range-take (numeric-range 0 10) 5))
  ⇒ (0 1 2 3 4)

(range->list (range-take-right (numeric-range 0 10) 5))
  ⇒ (5 6 7 8 9)

(range-drop range count)
(range-drop-right range count)

Returns a range which contains all except the first/last count elements of range. These procedures must run in O(1) time. The average accessing time of the resulting ranges is asymptotically bounded by the average accessing time respectively of range.

(range->list (range-drop (numeric-range 0 10) 5))
  ⇒ (5 6 7 8 9)

(range->list (range-drop-right (numeric-range 0 10) 5))
  ⇒ (0 1 2 3 4)

(range-count pred range1 range2 ...)

Applies pred element-wise to the elements of ranges and returns the number of applications which returned true values. If more than one range is given and not all ranges have the same length, range-count terminates when the shortest range is exhausted. The runtime of this procedure is O(s) where s is the sum of the total accessing times of the ranges.

(range-count even? (numeric-range 0 10)) ⇒ 5

(range-count < (numeric-range 0 10 2) (numeric-range 5 15)) ⇒ 5

(range-any pred range1 range2 ...)

Applies pred element-wise to the elements of the ranges and returns true if pred returns true on any application. Specifically it returns the last value returned by pred. Otherwise, #f is returned. If more than one range is given and not all ranges have the same length, range-any terminates when the shortest range is exhausted. The runtime of this procedure is O(s) where s is the sum of the total accessing times of the ranges.

(range-any odd? (numeric-range 0 10)) ⇒ #t

(range-any odd? (numeric-range 0 10 2)) ⇒ #f

(range-any < (numeric-range 0 10 2) (numeric-range 5 15)) ⇒ #t

(range-every pred range)

Applies pred element-wise to the elements of the ranges and returns true if pred returns true on every application. Specifically it returns the last value returned by pred, or #t if pred was never invoked. Otherwise, #f is returned. If more than one range is given and not all ranges have the same length, range-every terminates when the shortest range is exhausted. The runtime of this procedure is O(s) + O(k), where s is the sum of the total accessing times of the ranges and k is the number of ranges.

(range-every integer? (numeric-range 0 10)) ⇒ #t

(range-every odd? (numeric-range 0 10)) ⇒ #f

(range-every < (numeric-range 0 10 2) (numeric-range 5 15)) ⇒ #f

(range-map proc range1 range2 ...)
(range-map->list proc range1 range2 ...)
(range-map->vector proc range1 range2 ...)

Applies proc element-wise to the elements of the ranges and returns a range/list/vector of the results, in order. If more than one range is given and not all ranges have the same length, these procedures terminate when the shortest range is exhausted. The dynamic order in which proc is actually applied to the elements is unspecified. The runtime of these procedures is O(s) where s is the sum of the total accessing times of the ranges. The range-map procedure eagerly computes its result and returns an expanded range. Its average accessing time is O(1).

(range->list (range-map square (numeric-range 5 10))) ⇒ (25 36 49 64 81)

(range->list (range-map + (numeric-range 0 5) (numeric-range 5 10)))
  ⇒ (5 7 9 11 13)

(range-map->list square (numeric-range 5 10)) ⇒ (25 36 49 64 81)

(range-map->vector square (numeric-range 5 10)) ⇒ #(25 36 49 64 81)

(range-for-each proc range1 range2 ...)

Applies proc element-wise to the elements of the ranges in order. Returns an unspecified result. If more than one range is given and not all ranges have the same length, range-for-each terminates when the shortest range is exhausted. The runtime of this procedure is O(s) where s is the sum of the total accessing times of the ranges.

(let ((vec (make-vector 5)))
  (range-for-each (lambda (i) (vector-set! vec i (* i i)))
                  (numeric-range 0 5))
  vec) ⇒ #(0 1 4 9 16)

(range-filter-map proc range1 range2 ...)
(range-filter-map->list proc range1 range2 ...)

Applies proc element-wise to the elements of the ranges and returns a range/list of the true values returned by proc. If more than one range is given and not all ranges have the same length, these procedures terminate when the shortest range is exhausted. The dynamic order in which proc is actually applied to the elements is unspecified. The range-filter-map procedure eagerly computes its result and returns an expanded range. The runtime of these procedures is O(n) where n is the sum of the total accessing times of the ranges.

(range->list (range-filter-map (lambda (x) (and (even? x) (* x x)))
                               (numeric-range 0 10)))
  ⇒ (0 4 16 36 64)

(range-filter-map->list (lambda (x y)
                          (and (< x y) (* x y)))
                        (numeric-range 0 10 2)
                        (numeric-range 5 15))
  ⇒ (0 12 28 48 72)

(range-filter pred range)
(range-filter->list pred range)
(range-remove pred range)
(range-remove->list pred range)

Returns a range/list containing the elements of range that satisfy / do not satisfy pred. The runtime of these procedures is O(s) where s is the sum of the total accessing times of the ranges.

The range-filter and range-remove procedures eagerly compute their results and return expanded ranges. Their average accessing time is O(1).

(range->list (range-filter even? (numeric-range 0 10)))
  ⇒ (0 2 4 6 8)

(range-filter->list odd? (numeric-range 0 10))
  ⇒ (1 3 5 7 9)

(range->list (range-remove even? (numeric-range 0 10)))
  ⇒ (1 3 5 7 9)

(range-remove->list odd? (numeric-range 0 10))
  ⇒ (0 2 4 6 8)

(range-fold kons nil range1 range2 ...)
(range-fold-right kons nil range1 range2 ...)

Folds kons over the elements of ranges in order / reverse order. kons is applied as (kons state (range-ref range1 i) (range-ref range2 i)) where state is the result of the previous invocation and i is the current index. For the first invocation, nil is used as the first argument. Returns the result of the last invocation, or nil if there was no invocation. If more than one range is given and not all ranges have the same length, these procedures terminate when the shortest range is exhausted. The runtime of these procedures must be O(s) where s is the sum of the total accessing times of the ranges.

(range-fold (lambda (n _) (+ 1 n)) 0 (numeric-range 0 30))
  ⇒ 30

(range-fold + 0 (numeric-range 0 100) (numeric-range 50 70))
  ⇒ 1380

(range-fold-right xcons '() (numeric-range 0 10))
  ⇒ (0 1 2 3 4 5 6 7 8 9)

(range-fold-right (lambda (lis x y) (cons (+ x y) lis))
                  '()
                  (numeric-range 3 6)
                  (numeric-range 5 12))
  ⇒ (8 10 12)

Searching

(range-index pred range1 range2... )
(range-index-right pred range1 range2... )

Applies pred element-wise to the elements of ranges and returns the index of the first/last element at which pred returns true. Otherwise, returns #f. If more than one range is given and not all ranges have the same length, range-index terminates when the shortest range is exhausted. It is an error if the ranges passed to range-index-right do not all have the same lengths. The runtime of these procedures must be O(s) where s is the sum of the total accessing times of the ranges.

(range-index (lambda (x) (> x 10)) (numeric-range 5 15)) ⇒ 6

(range-index odd? (numeric-range 0 10 2)) ⇒ #f

(range-index = (numeric-range 0 12 2) (numeric-range 5 15)) ⇒ 5

(range-index-right odd? (numeric-range 0 5)) ⇒ 3

(range-take-while pred range)
(range-take-while-right pred range)

Returns a range containing the leading/trailing elements of range that satisfy pred up to the first/last one that does not. The runtime of these procedures is asymptotically bounded by the total accessing time of the range. The average accessing time of the resulting range is O(1).

(range->list (range-take-while (lambda (x) (< x 10))
                               (numeric-range 5 15)))
  ⇒ (5 6 7 8 9)

(range->list (range-take-while-right (lambda (x) (> x 10))
                                     (numeric-range 5 15)))
  ⇒ (11 12 13 14)

(range-drop-while pred range)
(range-drop-while-right pred range)

Returns a range that omits leading/trailing elements of range that satisfy pred until the first/last one that does not. The runtime of these procedures is asymptotically bounded by the total accessing time of the range. The average accessing time of the resulting range is O(1).

(range->list (range-drop-while (lambda (x) (< x 10))
                               (numeric-range 5 15)))
  ⇒ (10 11 12 13 14)

(range->list (range-drop-while-right (lambda (x) (> x 10))
                                     (numeric-range 5 15)))
  ⇒ (5 6 7 8 9 10)

Conversion

(range->list range)
(range->vector range)
(range->string range)

Returns a list/vector/string containing the elements of range in order. It is an error to modify the result of range->vector or of range->string. In the case of range->string, it is an error if any element of range is not a character. The running times of these procedures is O(s) where s is the total accessing time for range.

(range->list (numeric-range 0 0)) ⇒ ()

(range->vector (range 2 (lambda (i) (not (zero? i))))) ⇒ #(#f #t)

(range->string (range 5 (lambda (i) (integer->char (+ 65 i)))))
  ⇒ "ABCDE"

(vector->range vector)

Returns an expanded range whose elements are those of vector. Note that, unlike vector-range, it is not an error to mutate vector; future mutations of vector are guaranteed not to affect the range returned by vector->range. This procedure must run in O(n) where n is the length of vector. Otherwise, this procedure is equivalent to vector-range.

(range->list (vector->range #(1 3 5 7 9))) ⇒ (1 3 5 7 9)

(range->generator range)

Returns a SRFI 158 generator that generates the elements of range in order. This procedure must run in O(1) time, and the running time of each call of the generator is asymptotically bounded by the average accessing time of range.

(generator->list (range->generator (numeric-range 0 10)))
  ⇒ (0 1 2 3 4 5 6 7 8 9)

Implementation

The sample implementation is in the repository of this SRFI and in this .tgz file. An R7RS library file and a separate file containing the actual implementation are provided, along with a test file that works with SRFI 78, but is self-contained if SRFI 78 does not exist. The implementation uses SRFI 1 and can take advantage of SRFI 145 (assume) if it is present.

Acknowledgements

Without Python's range object, this SRFI would not exist. Thanks also to the contributors on the SRFI 196 mailing list.

Special thanks to Marc Nieper-Wi├čkirchen, who provided extensive feedback and invaluable insights during the development of this SRFI.

Copyright

© 2020 John Cowan, Wolfgang Corcoran-Mathe.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


Editor: Arthur A. Gleckler