[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
case mappings
- To: srfi-75@xxxxxxxxxxxxxxxxx
- Subject: case mappings
- From: Alex Shinn <alexshinn@xxxxxxxxx>
- Date: Wed, 13 Jul 2005 12:57:53 +0900
- Delivered-to: srfi-75@xxxxxxxxxxxxxxxxx
- Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; b=cj/YC07mQ/Rgj1C3IXogC91K99zerqWZz21+fewSBGzdP1RTtPU7OFwhkbY/MZD4bvXaUtLGFBXMoI5eePI2ls7Tgwy0v7aVw5h8OIQFeFlfxmT9eFC0d59+tmjmVFcU8F3nNhwtoemC7a/TziwtYLqfDbPtSillL7DzYlbe8To=
- Reply-to: Alex Shinn <alexshinn@xxxxxxxxx>
I agree with Bear that case-mappings are poorly defined on single
codepoints.
Michael Sperber wrote:
> I don't quite understand what you're saying: the locale-independent
> case mappings in UnicodeData.txt always map a single scalar value to a
> single scalar value. Sure it doesn't always do what your locale
> thinks (as you point out), but this case mapping doesn't require
> "multi-codepoint characters."
This isn't just a "locale-awareness" problem. True, the mappings in
UnicodeData.txt are for simplicity only the 1-1 mappings, but
SpecialCasing.txt includes a large number of mappings that aren't 1-1
regardless of locale. The Unicode concept of locale-independent
case-mapping includes these special cases. Without handling these
cases, R6RS would be using an incomplete case mapping rule,
which is therefore not usable in the general sense. I don't think anyone
wants 90% compatibility thrown into the core language.
Because the proper definition is so complicated and slow, yet there
are many uses of strict ASCII case mapping in computer languages
and protocols, I think it makes sense to define the core case-mapping
procedures as ASCII-specific. Full linguistic case-handling should be
provided by specialized library procedures which optionally accept locale,
and only work at the string level, since single-char case-mappings are
ill-defined.
char-title-case? would then no longer be needed.
--
Alex