[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

new draft of SRFI 115

Pending reflection on srfi.schemers.org is
available at:


Most issues are resolved.  The one thing people
should pay attention to is how case folding works
in conjunction with subtractive char-set operations
(~, -, &).

I'll refine the text before the final draft, but what it
says is that in a w/nocase context, case mapping
happens at the leaf level.  In a cset-sre you map
all literal characters to both their upper and lower-
case equivalents, and similarly for all named csets
(e.g. upper and lower both map to alpha).

The alternative is to perform all cset operations
normally and only case map the final result.  Usually
these are the same thing, except when a leaf does
not contain already all cases of a character and is
being used subtractively as noted above.

The rationale is:

1) It's what PCRE does.  "b" does not match /[^AaB]/i,
because the [AaB] class first becomes [AaBb], _then_
we take the complement.  If you were to take the
complement first, then you'd have [...bC-Zc-z...], and
case mapping this would match both "b" and "B".

2) It's needed when allowing nested w/ascii and w/noascii
within a cset-sre.

3) It allows mapping large named char sets statically,
avoiding expensive case mapping on large Unicode