[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: the "Unicode Background" section

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.



At Thu, 21 Jul 2005 15:45:34 -0700, Thomas Lord wrote:
> If CHARs are codepoints, more basic Unicode algorithms translate
> into Scheme cleanly.   

I don't see what you mean. Can you provide an example?

> What is gained by forcing surrogates to be unrepresentable as CHAR?

Every string is representable in UTF-8, UTF-16, etc.

> What kind of code will I wind up with if I want to iterate over
> a large range of CHAR values? 

Two loops: one from 0 to #xD7FF, and one from #xE000 to #x10FFFF.

> It's not as if by excluding surrogates we arrive at a CHAR definition
> that is significantly more "linguistic" than if we don't.

True, but we arrive at a definition that is more standards-friendly,
and that's part of the overall compromise.

FWIW: MzScheme originally supported a larger set of characters, mainly
because extra bits are available my implementation. The resulting bad
experience convinced me to define characters in terms of scalar values,
instead.

Matthew