[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Should SRFI-115 character sets match extended grapheme clusters?

This page is part of the web mail archives of SRFI 115 from before July 7th, 2015. The new archives for SRFI 115 contain all messages, not just those from before July 7th, 2015.

To: John Cowan <cowan@xxxxxxxxxxxxxxxx>
Subject: Re: Should SRFI-115 character sets match extended grapheme clusters?
From: Mark H Weaver <mhw@xxxxxxxxxx>
Date: Sun, 11 May 2014 15:53:28 -0400
Cc: SRFI-115 discussion list <srfi-115@xxxxxxxxxxxxxxxxx>
Delivered-to: srfi-115@xxxxxxxxxxxxxxxxx
In-reply-to: <20140511180833.GD17946@mercury.ccil.org> (John Cowan's message of "Sun, 11 May 2014 14:08:33 -0400")
References: <87bnv4ifwu.fsf@yeeloong.lan> <20140511180833.GD17946@mercury.ccil.org>
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux)

John Cowan <cowan@xxxxxxxxxxxxxxxx> writes:

> Mark H Weaver scripsit:
>
>> It occurs to me that users of languages that make heavy use of combining
>> marks will likely find the behavior of "character sets" to be quite
>> unintuitive if they operate on code points.  
>
> The way around that is normalization of the input, I think.

Normalization is an important part of the solution, but it alone does
not solve the problem where no precomposed character exists.  Figure 5
of TR15 gives some examples where NFC produces more than one codepoint
per character.

The question then becomes: Do we want ("ḍ̇q̣̇") to mean (or "ḍ̇" "q̣̇") or
should it mean (or "ḍ" "\x0307;" "q" "\x0323;" "\x0307;")?  It's a
question of how the string is split into elements.

There's also the question of whether (regexp-extract '(~ ("-")) "q̣̇")
should return ("q̣̇") or ("q" "\x0323;" "\x0307;").

> I will be proposing a normalization SRFI in future, presumably
> including the R6RS normalization procedures and some version of the
> normalized-comparison procedures that were rejected from R7RS-small.

Sounds useful.

     Mark

Follow-Ups:
- Re: Should SRFI-115 character sets match extended grapheme clusters?
  - From: John Cowan

References:
- Should SRFI-115 character sets match extended grapheme clusters?
  - From: Mark H Weaver
- Re: Should SRFI-115 character sets match extended grapheme clusters?
  - From: John Cowan

Prev by Date: Re: Should SRFI-115 character sets match extended grapheme clusters?
Next by Date: Re: Should SRFI-115 character sets match extended grapheme clusters?
Previous by thread: Re: Should SRFI-115 character sets match extended grapheme clusters?
Next by thread: Re: Should SRFI-115 character sets match extended grapheme clusters?
Index(es):
- Date
- Thread