[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: character strings versus byte strings

This page is part of the web mail archives of SRFI 50 from before July 7th, 2015. The new archives for SRFI 50 contain all messages, not just those from before July 7th, 2015.



Tom Lord <lord@xxxxxxx> writes:

>     > Many many many computer systems could get away with
>     > ignoring the locale-dependency of case-mapping, but now they can
>     > no longer plead ignorance.  (Though the problems are hardly
>     > obscure; even German causes problems.)
> 
> (I think that, being a culturally unbiased person, you mean that
> German causes one _unique_ problem regarding case mapping.)

The problem in German that I'm thinking of is the eszet problem, where
there is a lower case letter whose uppercase is a two-letter combo.
(And downcasing SS requires morpohological understanding of the word
as well, because not all SS pairs should be downcased as an eszet,
IIUC.)

That's a way in which German causes problems for easy case mapping.

The situation with the two Turkish I's is different, and more
symmetrical, and it would be wrong to characterize that as "Turkish
causing a problem".  But I think my characterization of the situation
with German stands.  That is, dealing with Turkish is no harder than
dealing with English--it's just hard to deal with both at once.

Dealing with German properly is hard all by itself.

Thomas