[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Should SRFI-115 character sets match extended grapheme clusters?

This page is part of the web mail archives of SRFI 115 from before July 7th, 2015. The new archives for SRFI 115 contain all messages, not just those from before July 7th, 2015.

To: John Cowan <cowan@xxxxxxxxxxxxxxxx>
Subject: Re: Should SRFI-115 character sets match extended grapheme clusters?
From: Alex Shinn <alexshinn@xxxxxxxxx>
Date: Mon, 12 May 2014 12:38:09 +0900
Cc: Mark H Weaver <mhw@xxxxxxxxxx>, SRFI-115 discussion list <srfi-115@xxxxxxxxxxxxxxxxx>
Delivered-to: srfi-115@xxxxxxxxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=xVjD4YFRUsXg3VyD3f/XXUl+KXNbxx3UQCznqFWA6Kk=; b=rMSSopU1otyxZsWj9oXPy00rMLoap+7ko6o2pLGG8bkipJ5UnW+y7r3ZXFIgB5kM7M wkOs0tQh/h7mfdOhbGrGQTFJCnPKize5pGXYJvUjq6iElyemSQ+WruyvZ/MXNkKTYsNa 58pCI1Gx6ZYsvb67yEBzoahWJ5O4ot+LLu+MuEA6PihYVyP4siiYtXPDFoTR235e7eGv aC6JC5eo8Jyo0qh+V2I7BkvPOD6ma6dwUohtV1PhT2W5FRCc9r1T+VTLvpTOtrkif6aG ZSIsAI1ZC7k4w1708jcJvbTFroG+BG/Z25tQVBmhEd/yO4Qo+p0y+vEt5C5nuL3KR0yh q4kg==
In-reply-to: <20140512030648.GO17946@mercury.ccil.org>
References: <87bnv4ifwu.fsf@yeeloong.lan> <20140511180833.GD17946@mercury.ccil.org> <87wqdsgkhz.fsf@yeeloong.lan> <20140511213925.GG17946@mercury.ccil.org> <CAMMPzYOVvRDjzLE_r15dSxti8jnBQDLZOo1_CM4CPP+GX7xGpA@mail.gmail.com> <20140512030648.GO17946@mercury.ccil.org>

On Mon, May 12, 2014 at 12:06 PM, John Cowan <cowan@xxxxxxxxxxxxxxxx> wrote:

Alex Shinn scripsit:

> Normalization was in the early issues and dismissed because of lack
> of implementation support and unclear costs in new implementations.
> I think good recommended practice for now is to just normalize both
> inputs and patterns separately.

Okay, I can live with that. But normalizing an SRE is not a matter of
normalizing the strings in the SRE: indeed, that will break it.

This is just recommended practice. If all of your string

literals within the SRE are in NFC, and the input strings

are all in NFC, then you minimize the cases which

failed to match because of a normalization difference.

So at the very least I think a normalize-sre procedure must be provided that
takes an SRE and does the nitty-gritty of selectively expanding charsets
into disjunctions of sequences.

Sure, but as the details and implementation don't exist

yet I'm leaving this for future work.

Alex

References:
- Should SRFI-115 character sets match extended grapheme clusters?
  - From: Mark H Weaver
- Re: Should SRFI-115 character sets match extended grapheme clusters?
  - From: John Cowan
- Re: Should SRFI-115 character sets match extended grapheme clusters?
  - From: Mark H Weaver
- Re: Should SRFI-115 character sets match extended grapheme clusters?
  - From: John Cowan
- Re: Should SRFI-115 character sets match extended grapheme clusters?
  - From: Alex Shinn
- Re: Should SRFI-115 character sets match extended grapheme clusters?
  - From: John Cowan

Prev by Date: Re: Should SRFI-115 character sets match extended grapheme clusters?
Next by Date: Re: revised w/nocase text, considering titlecase and cased
Previous by thread: Re: Should SRFI-115 character sets match extended grapheme clusters?
Next by thread: Re: Should SRFI-115 character sets match extended grapheme clusters?
Index(es):
- Date
- Thread