[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Normalization vs. grapheme clusters

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.

The Unicode old farts^W^Wrespected elders have confirmed that if you segment
a Unicode string by grapheme clusters as officially defined by Unicode,
then normalization forms C and D will not change it; that is, any change
will be within grapheme clusters and not across their boundaries.

This does not hold for normalization forms KC and KD, which remove characters
with compatibility decompositions.

John Cowan  www.ccil.org/~cowan  www.reutershealth.com  jcowan@xxxxxxxxxxxxxxxxx
[T]here is a Darwinian explanation for the refusal to accept Darwin.
Given the very pessimistic conclusions about moral purpose to which his
theory drives us, and given the importance of a sense of moral purpose
in helping us cope with life, a refusal to believe Darwin's theory may
have important survival value. --Ian Johnston