[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: TR29 word boundary use cases



On Fri, Dec 13, 2013 at 11:51 AM, John Cowan <cowan@xxxxxxxxxxxxxxxx> wrote:
Alex Shinn scripsit:

> This is the \w specified in TR18, and Perl complies with it, so I
> think we should use it.  We should also provide an SRE name for just
> this char-set.  We can make it long and say `word-constituent' since
> the `word' uses will be more common.

Okay, in that case we have to downgrade the reference to Level 2, and say
something like "Level 2 except for word boundaries, which are at Level 1."
I'm good with that.

(Are we really at Level 2?  I haven't checked, but I'm doubtful.)

We're at level 2 for grapheme clusters.

We deliberately don't do normalization or level 2
case folding, and now are backing off of the word
boundaries.

The remaining parts of level 2 are the property
support, which the SRE design allows us to
provide as separate libraries.

-- 
Alex