[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: TR29 word boundary use cases

This page is part of the web mail archives of SRFI 115 from before July 7th, 2015. The new archives for SRFI 115 contain all messages, not just those from before July 7th, 2015.



On Fri, Dec 13, 2013 at 11:51 AM, John Cowan <cowan@xxxxxxxxxxxxxxxx> wrote:
Alex Shinn scripsit:

> This is the \w specified in TR18, and Perl complies with it, so I
> think we should use it.  We should also provide an SRE name for just
> this char-set.  We can make it long and say `word-constituent' since
> the `word' uses will be more common.

Okay, in that case we have to downgrade the reference to Level 2, and say
something like "Level 2 except for word boundaries, which are at Level 1."
I'm good with that.

(Are we really at Level 2?  I haven't checked, but I'm doubtful.)

We're at level 2 for grapheme clusters.

We deliberately don't do normalization or level 2
case folding, and now are backing off of the word
boundaries.

The remaining parts of level 2 are the property
support, which the SRE design allows us to
provide as separate libraries.

-- 
Alex