[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Should SRFI-115 character sets match extended grapheme clusters?

This page is part of the web mail archives of SRFI 115 from before July 7th, 2015. The new archives for SRFI 115 contain all messages, not just those from before July 7th, 2015.



On Mon, May 12, 2014 at 12:06 PM, John Cowan <cowan@xxxxxxxxxxxxxxxx> wrote:
Alex Shinn scripsit:

> Normalization was in the early issues and dismissed because of lack
> of implementation support and unclear costs in new implementations.
> I think good recommended practice for now is to just normalize both
> inputs and patterns separately.

Okay, I can live with that.  But normalizing an SRE is not a matter of
normalizing the strings in the SRE: indeed, that will break it.

This is just recommended practice.  If all of your string
literals within the SRE are in NFC, and the input strings
are all in NFC, then you minimize the cases which
failed to match because of a normalization difference.

So at the very least I think a normalize-sre procedure must be provided that
takes an SRE and does the nitty-gritty of selectively expanding charsets
into disjunctions of sequences.

Sure, but as the details and implementation don't exist
yet I'm leaving this for future work.

-- 
Alex