[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Response to SRFI 75.

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.

> >> Sixth, is there any way for a scheme implementation to support
> >> characters and strings in addutional encodings different from
> >> unicode and not necessarily subsets of it, and remain compliant?
> >
> > I don't think so, at least not in the way you envision.  I don't think
> > that's necessary or even a good idea, either.  This SRFI effectively
> > hijacks the char and string datatypes and says that the abstractions
> > for accessing them deal in Unicode.  Any representation that allows
> > you to do that---i.e. implement STRING-REF, CHAR->INTEGER, and
> > INTEGER->CHAR and so on in a way compatible with the SRFI is fine,
> > but I believe you're thinking about representations where that's not
> > the case.
> Hmmm.  I'm still of the opinion that making the programming
> abstraction more closely match the end-user abstraction (ie,
> with glyph=character rather than codepoint=character) is just
> plain better, in many ways.  It gives me the screaming willies
> that under Unicode, strings which to the eye look identical,
> can have different lengths, no codepoint at any particular
> index in common, and sort relative to each other such that
> there are an infinite number of unrelated strings that go
> between them.  To me, it is the codepoint=character model that
> is introducing representation artifacts and the glyph=character
> model comes a lot closer to avoiding them.
> But we've been there, and I've talked about that, at length.
> People seem determined to do it this way, and people with
> other languages seem to be doing it mostly this way too. I'm
> convinced that requiring the "wrong" approach in a way that
> outlaws a better one is a wrong thing, but I'm realistic by
> now that nobody else is going to be convinced.
> Also, I'm not entirely happy about banning characters and
> character sets that aren't subsets of unicode.  In the first
> place there are a lot of characters that aren't in Unicode
> and are likely never to be - ask a Chinese person to write
> his own address without using one and you'll begin to see
> the problem.  And in the second place, traditionally the
> characters have been used to describe a lot of non-character
> entities - and while some of these come through in control codes,
> others, including the very useful keystroke-description codes
> from, eg, MITscheme, simply don't.

You may be right and wrong at the same time. Right because UNICODE is probably
not the last word on working with non-ASCII thingies. Wrong because for now UNICODE
is the only serious effort in this direction that made it as far as a de facto standard.

Apart from that, improving on the simple-minded "char=scalar value, string =
vector of char, upcase etc. are char -> char" might hurt more than it helps.


Btw, I am happy that the R6RSers decided to do some SRFI rounds with the stuff;
some discussion in public is better than none at all.