This page is part of the web mail archives of SRFI 14 from before July 7th, 2015. The new archives for SRFI 14 contain all messages, not just those from before July 7th, 2015.
As I prepare to conclude work on the SRFI-13 string library, I have reworked the SRFI-14 character-set spec, principally to get it synced up with the Unicode world. Mike S will presumably have the new draft available at http://srfi.schemers.org/srfi-14/srfi-14.txt (It is also available at ftp://ftp.ai.mit.edu/people/shivers/srfi/14/srfi-14.txt) A summary of the changes appears below. I have no further changes I wish to make to this library. If review does not reveal any problems, we can put this to bed. -Olin ------------------------------------------------------------------------------- - Added a function for hashing character sets. - Uniformly extended the char-set constructor procedures to take an optional BASE-CS argument; in this case, the procedure adds the requested characters to the characters already in BASE-CS. This allows convenient incremental construction of heterogeneous character sets, e.g. (predicate->char-set vowel? (list->char-set '(#\+ #\-) (string->char-set "13579"))) or, more efficiently (predicate->char-set! vowel? (list->char-set! '(#\+ #\-) (string->char-set "13579"))) - I removed the seventeen predicates char-lower-case? char-upper-case? char-title-case? char-letter? char-digit? char-letter+digit? char-graphic? char-printing? char-whitespace? char-iso-control? char-punctuation? char-symbol? char-hex-digit? char-blank? char-ascii? char-empty? char-full? They belong in a *character* library, not a char-set library. - I have made pervasive changes to the SRFI to bring it into alignment with Unicode concepts: - Changed the name ASCII-RANGE->CHAR-SET to the more modern UCS-RANGE->CHAR-SET, and provided a full specification in terms of UCS/Unicode. - Changed "alphabetic" and "numeric" to Unicode terms "letter" and "digit." - Split "symbols" out from "punctuation" characters, in conformance with Unicode. - Renamed CHAR-SET:CONTROL to CHAR-SET:ISO-CONTROL, to make clear that weirdo Unicode control codes are excluded. (This is in alignment with Java.) - Added CHAR-SET:TITLECASE to accompany CHAR-SET:LOWERCASE & CHAR-SET:UPPERCASE. - Specified what the standard character sets are in Unicode, Latin-1 and ASCII implementations. These definitions are almost completely compatible with Java's. (The only real incompatibility is the definition of whitespace.) The ASCII/Latin-1/Unicode specs are compatible, so that code written using these sets has a good chance of being portable across implementations with different underlying character representations. Being compatible with Java is occasionally challenging, as the Java definitions are not internally consistent. There is discussion of the specifics where relevant.