This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.
Michael Sperber scripsit: > US-ASCII, ISO 8859-1, and UCS-2-based [...] > subsets are all closed with respect to the case folding in > UnicodeData.txt. I don't know offhand if that's also the case with > full Unicode case folding. It is not true of either simple or full case folding as specified in CaseFolding.txt; in particular, the 8859-1 character MICRO SIGN (0xB5, U+00B5) folds to a proper GREEK SMALL LETTER MU (U+03BC) as a consequence of the compatibility equivalence between the two. There are also encodings which are not closed even under lowercasing: of the 123 encodings I have information for, 30 are not closed under lowercasing, 54 are not closed under simple folding, and 60 are not closed under full folding. (Details on request.) Jorgen Schaefer scripsit: > Luckily, case folding is specified in such a way that a normalized > sequence of code points remains normalized if case-folded. This is exactly backwards. Case folding does *not* preserve normalization, but *does* work correctly even on unnormalized input. For example, the sequence <0130> is in normalization form C, but folds to <0069,0307>, which is not. I do agree that normalization functions are a Good Thing, though not necessarily for the Scheme core. -- Overhead, without any fuss, the stars were going out. --Arthur C. Clarke, "The Nine Billion Names of God" John Cowan <jcowan@xxxxxxxxxxxxxxxxx>