This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.
Okay, I have a few things about this SRFI that I want to point out. First, I feel that SRFI's are not the proper forum for pre-publishing R6RS material. But this is a matter of taste, and if people disagree I'll shut up about it. Second, I see no point in limiting the representation of unicode characters to 2, 4, or 8 hexadecimal digits. In using the last format, one would be constrained always to pad with two constant zero digits which carry no information. To read hexadecimal numbers of unfixed length is code that every implementor supporting hex numbers has to have already written, and since a trailing delimiter is required in the new syntax (a move I agree with, btw), the limited selection of fixed lengths avoids no confusion. Third, I think that char-upcase, char-downcase, string-upcase, and string-downcase should be added to the list of functions that "may not produce the results an end-user would consider sensible with a particular locale," mainly to clarify what the document elsewhere says; that they implement the case mappings from UnicodeData.txt, rather than locale-dependent case-mappings. Fourth, in general there are still problems if you're sticking to the simplistic "codepoint equals character" model. Particularly, some characters, particularly accented characters, have uppercase and lowercase versions which are different numbers of codepoints. Thus, in the "codepoint equals character" model, one case is a character and the other case -- isn't. The other case, in fact, is something impossible to return from a routine whose return value is a "character." This introduces range confusion in both char-downcase and char-upcase, and this in turn (I believe) hoses your suggested implementations of char-ci=?, char-ci<?, char-ci>? char-ci<=? and char-ci>=?. You need to either remove the restriction and allow multi-codepoint characters, or embrace the restriction and explicitly state that the results of these functions are undefined in cases where the lowercase form is in fact not a single codepoint. Fifth, I think you need to add to the general set of character predicates defined by SRFI-14 one additional predicate: char-unused? which returns true for characters which are inside the valid range but which are not actually mapped to any character in Unicode. Sixth, is there any way for a scheme implementation to support characters and strings in addutional encodings different from unicode and not necessarily subsets of it, and remain compliant? For example several schemes have character sets that more accurately describe keystrokes than characters, containing entities such as "META-J" and "SHIFT-F10" and similar that have no corresponding unicode entities. For another example there are several asian scripts that Unicode is observed to make a hash of, representing the same character at several different codepoints, and people who work with these scripts prefer other encodings. Bear