[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

*To*: srfi-67@xxxxxxxxxxxxxxxxx*Subject*: default ordering of vectors*From*: Sebastian Egner <sebastian.egner@xxxxxxxxxxx>*Date*: Fri, 22 Jul 2005 17:32:30 +0200*Delivered-to*: srfi-67@xxxxxxxxxxxxxxxxx

Dear list,

The issue of how to define the default ordering for vectors has not

yet been settled. I would like to pick it up again.

--- Issue ---

The question is what the default ordering of vectors is supposed to be

for this SRFI. There are two candidates: 'lexicographic ordering' (LEX)

and 'first by length then lexicographically' (LENGTH-LEX).

The choice affects VECTOR-COMPARE, DEFAULT-COMPARE, and the

way orders are interpreted on an abstract level. Moreover, there are

implications for future SRFIs (e.g. the drafts 66 and 74) that define vector-

like data structures and could provide a default ordering consistent

with this SRFI. A similar choice (LEX or SIZE-LEX) is to be made for

the default ordering of comparison-based set- and dictionary-like data

structures.

--- Summary of the discussion so far ---

The issue was brought up by Per Bothner who wrote in

http://srfi.schemers.org/srfi-67/mail-archive/msg00031.html

> The default compare for vectors is unusual, and more critically it is

> incompatible with the default compare for lists and strings.

> The latter are both lexicographic compare. I couldn't find a

> rationale for this difference.

>

> Note that this difference makes it difficult to have a unified

> "sequence hierarchy" as in Common Lisp. The problem is that

> ordering is defined in terms of *implementation* types rather

> than semantics.

>

> I suggest using lexicographic compare for all "sequence" types.

Mike Sperber added in

http://srfi.schemers.org/srfi-67/mail-archive/msg00031.html

that he found the distinction between list-compare and

vector-compare confusing.

The discussion continued around whether lists, strings, and vectors

should be interpreted as subtypes of an abstract sequence type or

as essentially independent types. The authors of this SRFI adopted

the latter point of view with the idea that the basic operations of vector

are constant-time SIZE and REF suggesting the LENGTH-LEX order

and LEX.

Moreover, uniform vectors (in the sense of several SRFI) and strings

are cited to derive that vectors should be ordered LEX. Per Bothner

wrote in http://srfi.schemers.org/srfi-67/mail-archive/msg00038.html

> Now consider strings. Is a string a vector? If you have uniform

> vectors, then it seems very strange to not view a string as a uniform

> vector. It follows that the ordering for strings should be the same

> as for uniform vectors.

>

> Strings use lexicographic order.

> Hence uniform vectors should use lexicographic order.

> Hence vectors should use lexicographic order.

Jens Axel Søgaard replies with the orderings of polynomials,

ordered first by degree and then lexicographically, and points

to the potentially higher efficiency of LENGTH-LEX over LEX

for types with constant-time SIZE operation. In

http://srfi.schemers.org/srfi-67/mail-archive/msg00041.html

he writes

> Suppose now that we have the following list:

>

> (list #10000(0 ... 0 1)

> #10001(0 ... 0 2)

> #10002(0 ... 0 3)

> #10003(0 ... 0 4))

>

> In this scenerio the order used in the srfi is better

> efficiencywise than the lexicographical order -- and not

> only by a constant factor.

>

> It is also possible to find a list where the srfi-ordering

> is slower than the lexicograpic, but the slow down

> will be bounded by a small constant factor. E.g.

>

> (list #10000(0 ... 0 1)

> #10000(0 ... 0 2)

> #10000(0 ... 0 3)

> #10000(0 ... 0 4))

Donovan Kolbly expressed his concerns about the cognitive load

of remembering that vectors and strings are ordered differently in

case vectors are not ordered LEX because he sometimes

converts strings into vectors of chars and back.

Mike Sperber disputes the arguments that the LENGTH-LEX ordering

of strings is more efficient and "natural" by its relation to the basic

operations on vectors (i.e. constant time SIZE and REF).

--- analysis ---

Looking at the discussion again, I see only three major arguments:

(1) Conceptually vectors and lists are just sequences, and these are

conventionally ordered LEX by default.

(2) LENGTH-LEX is more natural (and efficient) for sequences that

support a constant-time SIZE operation.

(3) Conceptually strings are "vectors of chars" and strings are conventionally

ordered LEX by default, so vectors should be ordered LEX as well in order

to reduce confusion.

From my point of view, (2) is the most fundamental, in a mathematical

sense. By providing different orders for vectors and lists it is also

possible to have the two most important liftings of total orders to

sequences readily available.

Argument (1) is important as well, primarily because apparently many

people are not comfortable with the existence of two default orderings

for sequences. This may indeed lead to surprises in case the ordering

matters and the user does not care what is specified in the SRFI. I do

not share the expectations, however, that this will be a serious problem.

For me, argument (3) is weak. The conventional ordering of strings is

mostly historic because it lends itself well to cellulose-based

implementations of dictionary data structures for sequences over

small-alphabets (e.g. western languages). This is by no means

fundamental in the mathematical sense. Japanese dictionaries,

for example, often order characters first by the number of strokes

and then by other means; for strings variant of LENGTH-LEX can

be found as well. So to summarize, I am fully convinced that strings

should be ordered LEX by default, but that does not imply vectors and

any other sequence container type should also be ordered LEX.

Given my interpretation of the situation, I would settle for the following:

* Lists are ordered LEX by default.

* Vectors are ordered LENGTH-LEX by default.

* Strings are ordered LEX with case-sensitive character comparison by default.

* The default ordering for lists and vectors donate their name also as

prototype orderings (order "-as-list" or "-as-vector") for other container types

that can be interpreted as sequences.

* Other vector-like data types should define all orderings that are most

natural for them. These need not necessarily be consistent with vectors,

lists, or strings. In case several default orderings are natural, these can

be named differently by appending "-as-<<ordering>>". In the cases of

SRFI 66 and 74 where the ordering is of minor importance, I would

recommend to use LENGTH-LEX to be consistent with this SRFI.

This happens to be the current state of the SRFI specification.

Sebastian.

**Follow-Ups**:**Re: default ordering of vectors***From:*Per Bothner

**Re: default ordering of vectors***From:*Panu A Kalliokoski

- Prev by Date:
**Re: On optional arguments** - Next by Date:
**Re: default ordering of vectors** - Previous by thread:
**Re: On optional arguments** - Next by thread:
**Re: default ordering of vectors** - Index(es):