[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Response to SRFI 75.

> >> Sixth, is there any way for a scheme implementation to support
> >> characters and strings in addutional encodings different from
> >> unicode and not necessarily subsets of it, and remain compliant?
> >
> > I don't think so, at least not in the way you envision.  I don't think
> > that's necessary or even a good idea, either.  This SRFI effectively
> > hijacks the char and string datatypes and says that the abstractions
> > for accessing them deal in Unicode.  Any representation that allows
> > you to do that---i.e. implement STRING-REF, CHAR->INTEGER, and
> > INTEGER->CHAR and so on in a way compatible with the SRFI is fine,
> > but I believe you're thinking about representations where that's not
> > the case.
> Hmmm.  I'm still of the opinion that making the programming
> abstraction more closely match the end-user abstraction (ie,
> with glyph=character rather than codepoint=character) is just
> plain better, in many ways.  It gives me the screaming willies
> that under Unicode, strings which to the eye look identical,
> can have different lengths, no codepoint at any particular
> index in common, and sort relative to each other such that
> there are an infinite number of unrelated strings that go
> between them.  To me, it is the codepoint=character model that
> is introducing representation artifacts and the glyph=character
> model comes a lot closer to avoiding them.
> But we've been there, and I've talked about that, at length.
> People seem determined to do it this way, and people with
> other languages seem to be doing it mostly this way too. I'm
> convinced that requiring the "wrong" approach in a way that
> outlaws a better one is a wrong thing, but I'm realistic by
> now that nobody else is going to be convinced.
> Also, I'm not entirely happy about banning characters and
> character sets that aren't subsets of unicode.  In the first
> place there are a lot of characters that aren't in Unicode
> and are likely never to be - ask a Chinese person to write
> his own address without using one and you'll begin to see
> the problem.  And in the second place, traditionally the
> characters have been used to describe a lot of non-character
> entities - and while some of these come through in control codes,
> others, including the very useful keystroke-description codes
> from, eg, MITscheme, simply don't.

You may be right and wrong at the same time. Right because UNICODE is probably
not the last word on working with non-ASCII thingies. Wrong because for now UNICODE
is the only serious effort in this direction that made it as far as a de facto standard.

Apart from that, improving on the simple-minded "char=scalar value, string =
vector of char, upcase etc. are char -> char" might hurt more than it helps.


Btw, I am happy that the R6RSers decided to do some SRFI rounds with the stuff;
some discussion in public is better than none at all.