[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: constant-time access to variable-width encodings




On Wed, 13 Jul 2005, Per Bothner wrote:

>bear wrote:

>> Aaaand, this is yet another problem that goes away if you embrace
>> glyph=character instead of codepoint=character.

> huh?  A glyph depends on a specific font.  No way can we define Scheme
> characters in terms of glyphs.
>
> Do you mean a (canonicalized) composite (combining) sequence?

Yes, that's what I mean.

> One
> problem is you can't practially map one of those to a fixed-length
> integer value, so we have to give up char->integer and integer->char.

Or allow them to accept/return bignums, or limit their ranges. Point.
I think bignums in these routines are a much smaller sacrifice to
consistency than others being discussed here.  Implementations which
do not support bignums may report a violation of an implementation
restriction, I guess.

> Also, if equal characters are to be eq? they would have to interned,
> like strings.  Both of these chanegs are possible, but a rather radical
> (and unneeded departure) from current practice.

But strings aren't interned, they're just boxed.  You mean symbols,
don't you?  (symbols have to guarantee eq?-ness; strings can be eqv?
without being eq?.)  OTOH, multi-codepoint characters would have to be
boxed; if you want to guarantee eq?ness you'd have to maintain a
global character table, similar to the global symbol table, which I
think is what you mean by "interned."  And this implies that if you
wanted to be able to garbage-collect characters, ever, you'd have to
support soft pointers in your garbage collector and make the global
character table refer to the boxed entities using them (just like
symbols that way).

I think it would be reasonable to have eq? no longer guaranteed on
multi-codepoint characters; use eqv? (or better yet, char=?) instead.

				Bear