[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: revised w/nocase text, considering titlecase and cased

This page is part of the web mail archives of SRFI 115 from before July 7th, 2015. The new archives for SRFI 115 contain all messages, not just those from before July 7th, 2015.

Alex Shinn scripsit:

>   As a special case, the pre-defined named character sets
>   upper and lower (and their aliases upper-case and lower-case)
>   are defined to match all characters with the cased property (L&).
>   Note also all other pre-defined named character sets are
>   equivalent to themselves under w/nocase.
>   Rationale: The differences between the case insensitive
>   lower and upper and the cased property are few and unlikely
>   to match user intention.  Moreover, unlike the algorithmically
>   mapped upper and lower char-sets, the cased property is
>   readily available in most Unicode implementations.

Looks good to me.

I think this language should also be added:

    Note that placing a sequence consisting of a base character
    and combining characters into a character string representing
    a character set will not do what the user probably expects;
    it will create a character set pattern containing the base
    character and the combining character(s) as alternatives.
    For the same reason, it is inadvisable to apply Unicode
    normalization to such strings.

> And the only realistic alternative I can see is making this
> special case optional, so that either behavior is correct.

Too much flexibility, I think.

John Cowan          http://www.ccil.org/~cowan        cowan@xxxxxxxx
A witness cannot give evidence of his age unless he can remember being born.
                --Judge Blagden