Title

Titlecase procedures

Author

John Cowan

Status

This SRFI is currently in final status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-129@nospamsrfi.schemers.org. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.

Received: 2015-11-30
60-day deadline: 2016-01-29
Draft #1 published: 2015-11-30
Draft #2 published: 2015-11-30
Draft #3 published: 2015-12-07 (reference implementation only)
Finalized: 2016-03-08

Abstract

This SRFI defines R7RS-style char-title-case?, char-titlecase, and string-titlecase procedures.

Issues

None at present.

Rationale

The Latin letters of the ASCII repertoire are divided into two groups, the uppercase letters A-Z and the lowercase letters a-z. In Unicode matters are more complicated. For historical reasons, some Unicode characters represent two consecutive letters, the first uppercase and the second lowercase. These are known as titlecase letters, because they can be used to capitalize words, as in book titles. They can also appear at the beginning of a sentence. In all cases, it is possible to avoid titlecase letters by using two Unicode characters to represent the sequence.

There are four Latin titlecase letters, each with an uppercase and a lowercase counterpart. For example, the titlecase letter ǲ has the uppercase counterpart Ǳ and the lowercase counterpart ǳ. These may be replaced by the usually identical-looking two-character sequences Dz, DZ, and dz respectively. Similarly, there are 27 Greek titlecase letters, each of which has Greek ι displayed either as a diacritic under the capital letter or immediately following it. For example, ᾈ is a titlecase letter with ᾀ as its lowercase counterpart. There is no single-character uppercase equivalent; one must use the two-character sequence ἈΙ instead.

This SRFI defines Unicodely correct char-title-case?, char-titlecase, and string-titlecase procedures similar to those specified in R6RS. They correspond to the R7RS-small procedures char-upper-case?, char-lower-case?, char-upcase, char-downcase, string-upcase, and string-downcase. The titlecase versions didn't seem important enough to include in the small language, but are a useful building block for future SRFIs. The specification does not depend on the availability of full Unicode, however, and will work just as well with a partial or even purely ASCII repertoire.

As an example of why the R6RS definition of string-titlecase does not suffice, consider the string ﬂoo powDER, which begins with a ligature of the characters f and l. The Unicode way of titlecasing this string is to treat the ligature the same as the two-character sequence fl, in which case the result is Floo Powder. However, by the strict letter of R6RS, the ﬂ character must be passed to char-titlecase, which in this case will return its argument unchanged, and the result is ﬂoo Powder. What is more, if the ﬂ character is not even seen as a casing letter, then the result will be ﬂOo Powder. Existing Schemes exhibit all of these behaviors.

Specification

The procedures in this SRFI are in the (srfi 129) library (or (srfi :129) on R6RS), but the sample implementation currently places them in the (titlecase) library.

(char-title-case? char)

Returns #t if char is a character belonging to the Unicode category Lt, and #f otherwise. (The same as the R6RS equivalent.)

(char-titlecase char)

Returns the titlecase equivalent of char, if that character exists in the implementation, and char otherwise. The titlecase equivalent of a character is typically not a titlecase character; for most characters it is the same as the uppercase equivalent or else the character itself. Note that language-sensitive mappings are not used. (The same as the R6RS equivalent.)

(string-titlecase string)

This procedure applies the Unicode full string lowercasing algorithm to its argument. However, any character preceded by a non-cased character, or which is the first character of string, is processed by a different algorithm. If such a character has a multi-character titlecase mapping specified by Unicode, and all the characters of the mapping are supported by the implementation, then it is replaced by that mapping. Otherwise, it is replaced by its single-character titlecase mapping as if by char-titlecase. The result of the application of these algorithms is returned.

In certain cases, the result differs in length from the argument. If the result is equal to the argument in the sense of string=?, the argument may be returned. Note that language-sensitive mappings are not used. (The R6RS version does not make use of multi-character mappings.)

Implementation

The sample implementation is found in the repository of this SRFI. It contains the following files.

titlecase-impl.scm - the procedures
titlemaps.scm - the Unicode mapping tables for the implementation
chicken-shim.scm - an adapter from the Chicken utf8 egg to R7RS
titlecase.sld - an R7RS library
titlecase.scm - a Chicken library
titlecase-test.scm - a test file using the Chicken test egg

Copyright

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Editor: Arthur A. Gleckler