[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: the "Unicode Background" section

At Thu, 21 Jul 2005 15:45:34 -0700, Thomas Lord wrote:
> If CHARs are codepoints, more basic Unicode algorithms translate
> into Scheme cleanly.   

I don't see what you mean. Can you provide an example?

> What is gained by forcing surrogates to be unrepresentable as CHAR?

Every string is representable in UTF-8, UTF-16, etc.

> What kind of code will I wind up with if I want to iterate over
> a large range of CHAR values? 

Two loops: one from 0 to #xD7FF, and one from #xE000 to #x10FFFF.

> It's not as if by excluding surrogates we arrive at a CHAR definition
> that is significantly more "linguistic" than if we don't.

True, but we arrive at a definition that is more standards-friendly,
and that's part of the overall compromise.

FWIW: MzScheme originally supported a larger set of characters, mainly
because extra bits are available my implementation. The resulting bad
experience convinced me to define characters in terms of scalar values,