[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Response to SRFI 75.
> >> Sixth, is there any way for a scheme
implementation to support
> >> characters and strings in addutional encodings different
> >> unicode and not necessarily subsets of it, and remain compliant?
> > I don't think so, at least not in the way you envision. I
> > that's necessary or even a good idea, either. This SRFI
> > hijacks the char and string datatypes and says that the abstractions
> > for accessing them deal in Unicode. Any representation
> > you to do that---i.e. implement STRING-REF, CHAR->INTEGER,
> > INTEGER->CHAR and so on in a way compatible with the SRFI
> > but I believe you're thinking about representations where that's
> > the case.
> Hmmm. I'm still of the opinion that making the programming
> abstraction more closely match the end-user abstraction (ie,
> with glyph=character rather than codepoint=character) is just
> plain better, in many ways. It gives me the screaming willies
> that under Unicode, strings which to the eye look identical,
> can have different lengths, no codepoint at any particular
> index in common, and sort relative to each other such that
> there are an infinite number of unrelated strings that go
> between them. To me, it is the codepoint=character model that
> is introducing representation artifacts and the glyph=character
> model comes a lot closer to avoiding them.
> But we've been there, and I've talked about that, at length.
> People seem determined to do it this way, and people with
> other languages seem to be doing it mostly this way too. I'm
> convinced that requiring the "wrong" approach in a way that
> outlaws a better one is a wrong thing, but I'm realistic by
> now that nobody else is going to be convinced.
> Also, I'm not entirely happy about banning characters and
> character sets that aren't subsets of unicode. In the first
> place there are a lot of characters that aren't in Unicode
> and are likely never to be - ask a Chinese person to write
> his own address without using one and you'll begin to see
> the problem. And in the second place, traditionally the
> characters have been used to describe a lot of non-character
> entities - and while some of these come through in control codes,
> others, including the very useful keystroke-description codes
> from, eg, MITscheme, simply don't.
You may be right and wrong at the same
time. Right because UNICODE is probably
not the last word on working with non-ASCII
thingies. Wrong because for now UNICODE
is the only serious effort in this direction
that made it as far as a de facto standard.
Apart from that, improving on the simple-minded
"char=scalar value, string =
vector of char, upcase etc. are char
-> char" might hurt more than it helps.
Btw, I am happy that the R6RSers decided
to do some SRFI rounds with the stuff;
some discussion in public is better
than none at all.