by Daphne Preston-Kendal
This SRFI is currently in final status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-223@nospamsrfi.schemers.org
. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.
Generalized procedures for binary search of vector-like data structures are provided which can be applied to any sequence type, including ones defined by the user, together with applications of these procedures for Scheme’s built-in vectors.
While SRFIs 43 and 133 provide a vector-binary-search
procedure, in neither SRFI is the option provided to control whether the index of the leftmost or rightmost matching element of the vector is returned. They also use a C-style cmp
procedure which is compatible neither with the less-than comparison predicate used by SRFI 132 for sorting lists and vectors (such as might be used to make one ready for binary search), nor with SRFI 128 comparator ordering predicates. Finally, vector-binary-search
, as the name implies, can only be used on Scheme vectors. Use on other sequence types, such as the homogeneous numeric vectors of SRFI 160, is not foreseen.
As binary search is notoriously tricky to implement correctly (especially in light of its apparent simplicity), a correct implementation which is generalizable to any sequence type (including one that a programmer might define themselves) is a useful building block for fast search applications.
Programmers should be aware that it rarely makes sense to use binary search on any sequence type which does not provide O(1) (or ‘effectively O(1)’) access to its members. Scheme lists are not suitable for binary searching, as list-ref
is an O(n) operation.
In the following procedure specifications, the following variable names are used as procedure arguments having a particular meaning:
bytevector-u8-ref
as ref when a is a vector.)
#t
if the first is strictly less than the second according to the sort order of the sequence a, or otherwise #f
. It is the programmer’s responsibility to ensure that less? can handle comparisons between the given val and any value present in the sequence a; that the order of items in a given sequence a actually corresponds to the behaviour of the given less?; and that it is irreflexive, antisymmetric, and transitive, yielding a total ordering of all values in its domain.
The following procedures must work when used, with suitable ref, lo, and hi arguments, on sequence types which define negative indexes. (This does not refer to the potential use of negative indexes to refer to items in a sequence counting from the last item rather than the first, but rather to sequences where negative indexes refer to unique positions in the sequence.)
(bisect-left
a val ref less? lo hi)
⇒ idx
(bisect-right
a val ref less? lo hi)
⇒ idx
Defining both *-bisect-left
and *-bisect-right
procedures for a given type, and handling the optionality of lo and hi correctly, can make code very repetitive. To ease this repetition, the following convenience procedure is provided.
(bisection
ref)
⇒
1 (left-proc
a val less? lo hi)
(right-proc
a val less? lo hi)
(bisection
ref lo-hi-proc)
⇒
1
(left-proc
a val less?)
(left-proc
a val less? lo hi)
(right-proc
a val less?)
(right-proc
a val less? lo hi)
bisect-left
and bisect-right
respectively with the given ref procedure, comparable to vector-bisect-left
and vector-bisect-right
for vectors. If the lo-hi-proc argument is given, it should be a procedure which takes one argument, a sequence a, and returns two values, the default lo and hi values for that sequence (usually 0 and the length of the sequence). If no lo-hi-proc procedure is given, the lo and hi arguments to the returned procedures are mandatory.
(vector-bisect-left
a val less?)
⇒ idx
(vector-bisect-left
a val less? lo hi)
⇒ idx
bisect-left
.
(vector-bisect-right
a val less?)
⇒ idx
(vector-bisect-right
a val less? lo hi)
⇒ idx
bisect-right
.
All procedures provided by this SRFI are in the library called (srfi 223)
, mutatis mutandis.
Implementations and future RnRS specifications should, if they adopt this SRFI, specify analogous *-bisect-left
and *-bisect-right
procedures for all sequence types they provide which offer O(1) or ‘effectively O(1)’ access time, and in which sorted data may reasonably be expected to appear, but they should do under another library name. The library (srfi 223)
must contain only the procedures directly specified here. They may also choose to divide the core, generalized procedures of this library (bisect-left
, bisect-right
, and bisection
) into one library (such as (xyz bisect)
) and make each application of those procedures to specific sequence types their own sublibrary (such as (xyz bisect vector)
, (xyz bisect flexvector)
, etc.)
Bisect procedures which operate on Scheme bytevectors can be defined concisely with the following code.
(define-values (bytevector-bisect-left bytevector-bisect-right)
(bisection bytevector-u8-ref
(lambda (bv) (values 0 (bytevector-length bv)))))
(vector-bisect-left #(1 2 2 3 5) 2 <)
| ⇒ 1
|
(vector-bisect-right #(1 2 2 3 5) 2 <)
| ⇒ 3
|
(vector-bisect-left #(1 2 2 3 5) 4 <)
| ⇒ 4
|
(vector-bisect-left #(1 2 2 3 5) 4 < 0 1)
| ⇒ 1
|
Or with the example bytevector bisection defined above, one could change the procedures here to their corresponding bytevector versions, and the vectors also to bytevectors, and get the same results.
The following code shows how to define a procedure vector-bisect-index
which returns the smallest index of val in the vector a, or #f
if val is not actually in a, according to the given SRFI 128 comparator cmpr.
(define (vector-bisect-index a val cmpr)
(let ((idx (vector-bisect-left a val (comparator-ordering-predicate cmpr))))
(and (< idx (vector-length a))
(=? cmpr val (vector-ref a idx))
idx)))
The sample implementation in the Git repository for this SRFI should be correct and work on all compliant R7RS small implementations, although programmers working on Schemes with no automatic promotion of the results of +
to bignums and very large sequences (typically more than a gibi-entry or so in size, depending on fixnum range) should be aware of the possibility of integer overflow. See Joshua Bloch, ‘Extra, Extra — Read All About It: Nearly All Binary Searches and Mergesorts are Broken’. Supporting Scheme systems without automatic bignum promotion is not a goal of the sample implementation.
Thanks to:
John Cowan, for suggesting a more rigorous specification of the less? parameter, using language from SRFI 128, which in turn inspired an altogether clearer description of what the programmer’s responsibilities are in terms of providing correct values to the procedures in this library;
Marc Nieper-Wißkirchen, for suggesting using SRFI 128 comparators instead of bare comparison predicates, and Daniel Itaborai, for requesting higher-level procedures, which suggestions were not adopted but together inspired the above example of implementing vector-bisect-index
;
Arthur Gleckler, for editing this SRFI and many others, and suggesting a cleaner implementation of vector-bisect-index
;
and especially to
Sundarshan S. Chawathe, who by attempting to create an independent implementation of this library uncovered ambiguities and problems in the specifications of bisect-left
and bisect-right
in earlier drafts of this SRFI.
The sample implementation is essentially a translation of the Python standard library implementation of binary search in bisect.py
into Scheme.
Any remaining errors in specification or sample implementation are the author’s and the author’s alone.
© 2021 Daphne Preston-Kendal (text and sample implementation in Scheme).
© 1992–2019 Python Software Foundation (guideline algorithm implementation in Python).
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.