[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

*To*: srfi-67@xxxxxxxxxxxxxxxxx*Subject*: Re: vector compare*From*: Sebastian Egner <sebastian.egner@xxxxxxxxxxx>*Date*: Thu, 9 Jun 2005 12:41:09 +0200*Delivered-to*: srfi-67@xxxxxxxxxxxxxxxxx*In-reply-to*: <87m0iD.A.KhH.de_pCB@rotkohl>

> when using this "natural order"? What prior art is there for

> using this order?

Built-in lexicographical orders on vectors have hurt me more than once

for ordering actual 'vectors', i.e. elements of vector spaces (abelian group

with a field acting on it) of different dimension. By using the lex order you

effectively fix an embedding of the lower-dimensional vector spaces into

the higher---and that is often the wrong thing to do.

Concerning prior art: Are you really asking me why anybody anywhere would

like to order sequences by length and then lexicographically, instead just

lexicographically, and where you can find published references to programs

that do that?

Actually, I do not intend to settle this question by digging up 'prior art' and

looking where the bigger pile is. Let us focus on which problem we would

like to solve, and how the solutions fit with the structures that are present in

Scheme. The advantage of ordering LIST and VECTOR the same is that

it is easier to remember. The advantage of ordering them differently is that

you get the choice---based on the basic operations.

> > both be interpreted as mere implementations of some abstract SEQUENCE

> > data type,

>

> I'm saying that some Scheme implementations (at least Kawa) and at

> least one SRFI (44) does this. I'm also pointing out that Common Lisp

> does this. I.e. this is not a far-fetched concept.

SRFI 44 defines a lot and implements nothing. I think SRFI 44 represents a good deal of

good work. Yet it is still to be seen that this is really the way data structures should be

organized in Scheme. Comparing containers, for example, is a problem: SRFI 44 defines

nothing for that but its design excludes several possible approaches already.

Concerning Kawa and Common Lisp, the problem is the same: The object system brings

structure to the world of data types---but it also imposes a certain point of view.

This point of is certainly plausible, but it is not the only one, and its benefits are overrated.

My point of view is that the user should be enabled to construct exactly the data structures

required for an application from prefab building blocks---instead of finding a fully organized

world already. Initially the organized world is nice, but sooner or later you hit brick walls,

and the more complex it gets the more that hurts.

Now, if you have sequence data that is naturally ordered lexicographically than you can

use LISTs or VECTORs ordered as lists to represent the data. If you have sequences

that are naturally ordered by length and then lexicographically you can use VECTORs

or LISTs ordered as vectors. And in both cases it is likely that the concrete default

representation has something to do with the application already. (Which is the result of

the 'natural ordering based on the available primitive operations' thing.)

> It is true that if an ordering is defined for sequence,

> then it would be bad to define inconsistent orderings for different

> sub-types of sequence. I'm making a weaker claim: if you define an

> operation F on type A and on type B that are both sub-types of a common

> super-type S then good object-oriented design suggests you should

> define the operation F to have common properties (axioms) for both

> A and B.

And this what this SRFI is doing: Vector, List, Number, String, Symbol, Char, etc. are all

subtypes of the overall Scheme object type---and they all provide a compare procedure

that satisfies certain axioms. Furthermore the orderings of number types are required

to be compatible with the tower of numeric data types in R5RS, even though the tower

does not really define a subtype hierarchy in the strong sense, because that makes

sense mathematically. Finally, default-compare refines pair-compare. And that is as far

as the subtype thing goes in this SRFI. In particular, LIST and VECTOR are *not* seen

as two concrete realizations of a common container supertype.

Now your point of view is that Vector and List should be interpreted as subtypes

of a Sequence type---because SRFI 44, Kawa, and Common Lisp do that way

and it is "not a far-fetched concept." (Quite natural even, I would say.)

But I do not find this concept back in the definition of Scheme as it is now,

and there are alternative approaches that give you more choice.

Don't get me wrong, I am in fact a fan of the Sequence approach. I even experimented

with my own Sequence data type as an add-on for Scheme, but not as an abstract

type (or interface) to unify LIST and VECTOR. The sequences I experimented with

were self-optimizing (amortized) data structures that eventually adapt to their use

case (in particular LIFO, FIFO, or random-access) and are reasonably efficient

(i.e. O(n log n) with small constants) over an extremely large scale of size, including

external memory algorithms. In the end, the implementation went out of hand for

a hobby project and I had to stop. Now these sequences do not fit SRFI 44, nor

Kawa, nor Common LISP and they coexist with LIST and VECTOR.

> follow that on operation F must be defined in terms of sequences.

> However, it's usually a good idea.

>

> Moreover, some schemes provide "uniform vectors", and there are SRFIs

> that define them. What is the natural order for a uniform vector?

> By your argument, and I think "common sense" (admittedly unreliable)

> it should be the same order as for normal vectors. Certainly if you

> believe uniform vectors "inherit" from an "abstract vector" type.

>

> Now consider strings. Is a string a vector? If you have uniform

> vectors, then it seems very strange to not view a string as a uniform

> vector. It follows that the ordering for strings should be the same

> as for uniform vectors.

>

> Strings use lexicographic order.

> Hence uniform vectors should use lexicographic order.

> Hence vectors should use lexicographic order.

My whole point is that the order to be defined for a data type does not follow

from the data type or its supertypes but is part of what the data type means.

Now if the uniform vectors represent elements of vector spaces than you might

want a different ordering than if they represent strings.

Btw., strings are one the most exceptional things around. The lex. ordering is

historic and not at all 'God-given'. Japanese dictionaries contain several

alternative orderings: By stroke number, by radical (prototypes of parts of the

actual characters, most are also characters in themselves), or by pronounciation.

"The natural order" of strings is a matter of taste. In Germany the treatment of

Umlauts has changed during the past 30 years: Today ä is inserted where

'ae' would be. But I have lots of dictionaries at home that insert ä where

'a' would be. Using Unicode scalar values ä has a high scalar value

so it would be inserted after 'z'. Ordering strings by length>lex can also make

sense in some applications.

So I have difficulties following you argumentation above.

My point of view is:

Yes, strings use lex. order because that makes sense as a general default.

No, that does not imply that uniform vectors should also use lex. order.

They can better provide one or more orders that make sense for

their intended applications. And no, that still does not imply that the built-in

VECTOR type should be ordered in any particular way for the sake of

uniformity.

Historically, "uniform vectors" do not allow you to specify the ordering you

want to use---but they should. Lukily, in the majority of the cases it is sufficient

to define any ordering at all. If u8vector or anything provides a default order,

I would make it length>lex because that is the most natural and efficient for

this type, unless you tell me that you always use them in hashing in which

case I would make the default order by hashcode, then length>lex. ;-)

Sebastian.

- Prev by Date:
**Re: Optional arguments at the beginning** - Next by Date:
**Re: vector compare** - Previous by thread:
**Re: vector compare** - Next by thread:
**Re: vector compare** - Index(es):