[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
(Hopefully) final changes to SRFI-14 (character sets)
As I prepare to conclude work on the SRFI-13 string library, I have reworked
the SRFI-14 character-set spec, principally to get it synced up with the
Unicode world. Mike S will presumably have the new draft available at
(It is also available at
A summary of the changes appears below. I have no further changes I wish
to make to this library. If review does not reveal any problems, we can
put this to bed.
- Added a function for hashing character sets.
- Uniformly extended the char-set constructor procedures to take an optional
BASE-CS argument; in this case, the procedure adds the requested characters
to the characters already in BASE-CS. This allows convenient incremental
construction of heterogeneous character sets, e.g.
(list->char-set '(#\+ #\-)
or, more efficiently
(list->char-set! '(#\+ #\-)
- I removed the seventeen predicates
char-lower-case? char-upper-case? char-title-case?
char-letter? char-digit? char-letter+digit?
char-graphic? char-printing? char-whitespace?
char-iso-control? char-punctuation? char-symbol?
char-hex-digit? char-blank? char-ascii?
They belong in a *character* library, not a char-set library.
- I have made pervasive changes to the SRFI to bring it into alignment with
- Changed the name ASCII-RANGE->CHAR-SET to the more modern
UCS-RANGE->CHAR-SET, and provided a full specification in terms
- Changed "alphabetic" and "numeric" to Unicode terms "letter" and "digit."
- Split "symbols" out from "punctuation" characters, in conformance with
- Renamed CHAR-SET:CONTROL to CHAR-SET:ISO-CONTROL, to make clear that
weirdo Unicode control codes are excluded. (This is in alignment with
- Added CHAR-SET:TITLECASE to accompany CHAR-SET:LOWERCASE &
- Specified what the standard character sets are in Unicode, Latin-1
and ASCII implementations. These definitions are almost completely
compatible with Java's. (The only real incompatibility is the definition
of whitespace.) The ASCII/Latin-1/Unicode specs are compatible, so
that code written using these sets has a good chance of being portable
across implementations with different underlying character representations.
Being compatible with Java is occasionally challenging, as the Java
definitions are not internally consistent. There is discussion of the
specifics where relevant.