[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: character strings versus byte strings

This page is part of the web mail archives of SRFI 50 from before July 7th, 2015. The new archives for SRFI 50 contain all messages, not just those from before July 7th, 2015.

On Mon, 22 Dec 2003, Tom Lord wrote:

>    > Many many many computer systems could get away with
>    > ignoring the locale-dependency of case-mapping, but now they can
>    > no longer plead ignorance.  (Though the problems are hardly
>    > obscure; even German causes problems.)
>(I think that, being a culturally unbiased person, you mean that
>German causes one _unique_ problem regarding case mapping.)

This is absolutely the case.  From the perspective of grapheme-
characters, and ignoring ligatures as a pure typesetting issue,
Eszett is the ONLY character in all of unicode that upcases into
a different number of characters.  I'm using an ugly kluge to
put off changing the length of any string until a canonicalization
operation, or return the upcase as a single non-standard character
(yet another character which doesn't exist in unicode), but I'm
sorely tempted to simply declare all use of eszett, given its
unique status in the history of human writing, to be an error.