[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: the "Unicode Background" section
At Thu, 21 Jul 2005 15:45:34 -0700, Thomas Lord wrote:
> If CHARs are codepoints, more basic Unicode algorithms translate
> into Scheme cleanly.
I don't see what you mean. Can you provide an example?
> What is gained by forcing surrogates to be unrepresentable as CHAR?
Every string is representable in UTF-8, UTF-16, etc.
> What kind of code will I wind up with if I want to iterate over
> a large range of CHAR values?
Two loops: one from 0 to #xD7FF, and one from #xE000 to #x10FFFF.
> It's not as if by excluding surrogates we arrive at a CHAR definition
> that is significantly more "linguistic" than if we don't.
True, but we arrive at a definition that is more standards-friendly,
and that's part of the overall compromise.
FWIW: MzScheme originally supported a larger set of characters, mainly
because extra bits are available my implementation. The resulting bad
experience convinced me to define characters in terms of scalar values,