by Daphne PrestonKendal
This SRFI is currently in draft status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi223@nospamsrfi.schemers.org
. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.
Generalized procedures for binary search of vectorlike data structures are provided which can be applied to any sequence type, including ones defined by the user, together with applications of these procedures for Scheme’s builtin vector and bytevector types.
No issues at present.
While SRFIs 43 and 133 provide a vectorbinarysearch
procedure, in neither SRFI is the option provided to control whether the index of the leftmost or rightmost matching element of the vector is returned. They also use a Cstyle cmp
procedure which is incompatible neither with the lessthan comparison predicate used by SRFI 132 for sorting lists and vectors (such as might be used to make one ready for binary search), nor with SRFI 128 comparator ordering predicates. Finally, vectorbinarysearch
, as the name implies, can only be used on Scheme vectors. Use on other sequence types, such as the homogenous numeric vectors of SRFI 160, is not foreseen.
As binary search is notoriously tricky to implement correctly (especially in light of its apparent simplicity), a correct implementation which is generalizable to any sequence type (including one that a programmer might define themselves) is a useful building block for fast search applications.
Programmers should be aware that it rarely makes sense to use binary search on any sequence type which does not provide O(1) (or ‘effectively O(1)’) access to its members. Scheme lists are not suitable for binary searching, as listref
is an O(n) operation.
In the following procedure specifications, the following variable names are used as procedure arguments having a particular meaning:
bytevectoru8ref
as ref when a is a vector.)
#t
if the first is strictly less than the second according to the sort order of the sequence a, or otherwise #f
. It is the programmer’s responsibility to ensure that less? can handle comparisons between the given val and any value present in the sequence a; that the order of items in given sequence a actually corresponds to the behaviour of the given less?; and that it is irreflexive, antisymmetric, and transitive, yielding a total ordering of all values in its domain.
The following procedures must work when used, with suitable ref, lo, and hi arguments, on sequence types which define negative indexes. (This does not refer to the potential use of negative indexes to refer to items in a sequence counting from the last item rather than the first, but rather sequences where negative indexes refer to unique positions in the sequence.)
(bisectleft
a val ref less? lo hi)
⇒ idx
(bisectright
a val ref less? lo hi)
⇒ idx
Defining both *bisectleft
and *bisectright
procedures for a given type, and handling the optionality of lo and hi correctly, can make code very repetitive. To ease this repetition, the following convenience procedure is provided.
(bisection
ref)
⇒
1 (leftproc
a val less? lo hi)
(rightproc
a val less? lo hi)
(bisection
ref lohiproc)
⇒
1
(leftproc
a val less?)
(leftproc
a val less? lo hi)
(rightproc
a val less?)
(rightproc
a val less? lo hi)
bisectleft
and bisectright
respectively with the given ref procedure, comparable to vectorbisectleft
and vectorbisectright
for vectors. If the lohiproc argument is given, it should be a procedure which takes one argument, a sequence a, and returns two values, the default lo and hi values for that sequence (usually 0 and the length of the sequence). If no lohiproc procedure is given, the lo and hi arguments to the returned procedures are mandatory.
(vectorbisectleft
a val less?)
⇒ idx
(vectorbisectleft
a val less? lo hi)
⇒ idx
bisectleft
.
(vectorbisectright
a val less?)
⇒ idx
(vectorbisectright
a val less? lo hi)
⇒ idx
bisectright
.
All procedures provided by this SRFI are in the library called (srfi 223)
, mutatis mutandis.
Implementations and future RnRS specifications should, if they adopt this SRFI, specify analogous *bisectleft
and *bisectright
procedures for all sequence types they provide which offer O(1) or ‘effectively O(1)’ access time, and in which sorted data may reasonably be expected to appear; but they should do under another library name, and the library (srfi 223)
must contain only the procedures directly specified here. They may also choose to divide the core, generalized procedures of this library (bisectleft
, bisectright
, and bisection
) into one library (such as (xyz bisect)
) and make each application of those procedures to specific sequence types their own sublibrary (such as (xyz bisect vector)
, (xyz bisect flexvector)
, etc.)
Bisect procedures which operate on Scheme bytevectors can be defined concisely with the following code.
(definevalues (bytevectorbisectleft bytevectorbisectright)
(bisection bytevectoru8ref
(lambda (bv) (values 0 (bytevectorlength bv)))))
(vectorbisectleft #(1 2 2 3 5) 2 <)
 ⇒ 1

(vectorbisectright #(1 2 2 3 5) 2 <)
 ⇒ 3

(vectorbisectleft #(1 2 2 3 5) 4 <)
 ⇒ 4

(vectorbisectleft #(1 2 2 3 5) 4 < 0 1)
 ⇒ 1

Or with the example bytevector bisection defined above, one could change the procedures here to their corresponding bytevector versions, and the vectors also to bytevectors, and get the same results.
The following code shows how to define a procedure vectorbisectindex
which returns the smallest index of val in the vector a, or #f
if val is not actually in a, according to the given SRFI 128 comparator cmpr.
(define (vectorbisectindex a val cmpr)
(let ((idx (vectorbisectleft a val (comparatororderingpredicate cmpr))))
(if (and (< idx (vectorlength a))
(=? cmpr val (vectorref a idx)))
idx
#f)))
The sample implementation in the Git repository for this SRFI should be correct and work on all compliant R7RS small implementations, although programmers working on Schemes with no automatic promotion of the results of +
to bignums and very large sequences (typically more than a gibientry or so in size, depending on fixnum range) should be aware of the possibility of integer overflow. See Joshua Bloch, ‘Extra, Extra — Read All About It: Nearly All Binary Searches and Mergesorts are Broken’. Supporting Scheme systems without automatic bignum promotion is not a goal of the sample implementation.
The sample implementation is essentially a translation of the Python standard library implementation of binary search in bisect.py
into Scheme.
© 2021 Daphne PrestonKendal (text and sample implementation in Scheme).
© 1992–2019 Python Software Foundation (guideline algorithm implementation in Python).
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.