[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Why are byte ports "ports" as such?

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 are here. Eventually, the entire history will be moved there, including any new messages.

John Cowan <cowan@xxxxxxxx> writes:

> bear scripsit:
>> I feel that "unicode default grapheme clusters" more closely
>> map to what users call "characters" than codepoints do.  In
>> the interests of keeping the abstractions used by the programmer
>> as close as possible to the abstractions used by ordinary users,
>> I therefore support defining scheme characters as DCG's.
> While I disagree with this position, it is entirely coherent and
> consistent, and I wouldn't weep if R6RS went with it.


> The main argument against with it is that the "user" view of
> characters can cross DCG boundaries. From the codepoint level,
> other levels can be built up, including the DCG, syllable, word,
> sentence, and paragraph.

I wouldn't mind if R6RS went with code points, either.

In either case, what I would really like to see, though, is the
end of using confusing terms, such as "character". R6RS could very
well be using code points as the smallest units of strings, but
then should call it code point, not character.

The only reason why we're still talking about "character" is
backwards compatibility. I can, without confusion, talk about
Latin-1 "code points", but talking about Unicode "characters"
(meaning code points) is broken.

        -- Jorgen

((email . "forcer@xxxxxxxxx") (www . "http://www.forcix.cx/";)
 (gpg   . "1024D/028AF63C")   (irc . "nick forcer on IRCnet"))