[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

default ordering of vectors

Dear list,

The issue of how to define the default ordering for vectors has not
yet been settled. I would like to pick it up again.

--- Issue ---

The question is what the default ordering of vectors is supposed to be
for this SRFI. There are two candidates: 'lexicographic ordering' (LEX)
and 'first by length then lexicographically' (LENGTH-LEX).

The choice affects VECTOR-COMPARE, DEFAULT-COMPARE, and the
way orders are interpreted on an abstract level. Moreover, there are
implications for future SRFIs (e.g. the drafts 66 and 74) that define vector-
like data structures and could provide a default ordering consistent
with this SRFI. A similar choice (LEX or SIZE-LEX) is to be made for
the default ordering of comparison-based set- and dictionary-like data

--- Summary of the discussion so far ---

The issue was brought up by Per Bothner who wrote in

> The default compare for vectors is unusual, and more critically it is
> incompatible with the default compare for lists and strings.
> The latter are both lexicographic compare.  I couldn't find a
> rationale for this difference.
> Note that this difference makes it difficult to have a unified
> "sequence hierarchy" as in Common Lisp.  The problem is that
> ordering is defined in terms of *implementation* types rather
> than semantics.
> I suggest using lexicographic compare for all "sequence" types.

Mike Sperber added in
that he found the distinction between list-compare and
vector-compare confusing.

The discussion continued around whether lists, strings, and vectors
should be interpreted as subtypes of an abstract sequence type or
as essentially independent types. The authors of this SRFI adopted
the latter point of view with the idea that the basic operations of vector
are constant-time SIZE and REF suggesting the LENGTH-LEX order
and LEX.

Moreover, uniform vectors (in the sense of several SRFI) and strings
are cited to derive that vectors should be ordered LEX. Per Bothner
wrote in http://srfi.schemers.org/srfi-67/mail-archive/msg00038.html

> Now consider strings.  Is a string a vector?  If you have uniform
> vectors, then it seems very strange to not view a string as a uniform
> vector.  It follows that the ordering for strings should be the same
> as for uniform vectors.
> Strings use lexicographic order.
> Hence uniform vectors should use lexicographic order.
> Hence vectors should use lexicographic order.

Jens Axel Søgaard replies with the orderings of polynomials,
ordered first by degree and then lexicographically, and points
to the potentially higher efficiency of LENGTH-LEX over LEX
for types with constant-time SIZE operation. In
he writes

> Suppose now that we have the following list:
>  (list #10000(0 ... 0 1)
>        #10001(0 ... 0 2)
>        #10002(0 ... 0 3)
>        #10003(0 ... 0 4))
> In this scenerio the order used in the srfi is better
> efficiencywise than the lexicographical order -- and not
> only by a constant factor.
> It is also possible to find a list where the srfi-ordering
> is slower than the lexicograpic, but the slow down
> will be bounded by a small constant factor. E.g.
>  (list #10000(0 ... 0 1)
>         #10000(0 ... 0 2)
>         #10000(0 ... 0 3)
>         #10000(0 ... 0 4))

Donovan Kolbly expressed his concerns about the cognitive load
of remembering that vectors and strings are ordered differently in
case vectors are not ordered LEX because he sometimes
converts strings into vectors of chars and back.

Mike Sperber disputes the arguments that the LENGTH-LEX ordering
of strings is more efficient and "natural" by its relation to the basic
operations on vectors (i.e. constant time SIZE and REF).

--- analysis ---

Looking at the discussion again, I see only three major arguments:

(1) Conceptually vectors and lists are just sequences, and these are
conventionally ordered LEX by default.

(2) LENGTH-LEX is more natural (and efficient) for sequences that
support a constant-time SIZE operation.

(3) Conceptually strings are "vectors of chars" and strings are conventionally
ordered LEX by default, so vectors should be ordered LEX as well in order
to reduce confusion.

From my point of view, (2) is the most fundamental, in a mathematical
sense. By providing different orders for vectors and lists it is also
possible to have the two most important liftings of total orders to
sequences readily available.

Argument (1) is important as well, primarily because apparently many
people are not comfortable with the existence of two default orderings
for sequences. This may indeed lead to surprises in case the ordering
matters and the user does not care what is specified in the SRFI. I do
not share the expectations, however, that this will be a serious problem.

For me, argument (3) is weak. The conventional ordering of strings is
mostly historic because it lends itself well to cellulose-based
implementations of dictionary data structures for sequences over
small-alphabets (e.g. western languages). This is by no means
fundamental in the mathematical sense. Japanese dictionaries,
for example, often order characters first by the number of strokes
and then by other means; for strings variant of LENGTH-LEX can
be found as well. So to summarize, I am fully convinced that strings
should be ordered LEX by default, but that does not imply vectors and
any other sequence container type should also be ordered LEX.

Given my interpretation of the situation, I would settle for the following:
* Lists are ordered LEX by default.
* Vectors are ordered LENGTH-LEX by default.
* Strings are ordered LEX with case-sensitive character comparison by default.
* The default ordering for lists and vectors donate their name also as
prototype orderings (order "-as-list" or "-as-vector") for other container types
that can be interpreted as sequences.
* Other vector-like data types should define all orderings that are most
natural for them. These need not necessarily be consistent with vectors,
lists, or strings. In case several default orderings are natural, these can
be named differently by appending "-as-<<ordering>>". In the cases of
SRFI 66 and 74 where the ordering is of minor importance, I would
recommend to use LENGTH-LEX to be consistent with this SRFI.

This happens to be the current state of the SRFI specification.