[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Normalization vs. grapheme clusters



The Unicode old farts^W^Wrespected elders have confirmed that if you segment
a Unicode string by grapheme clusters as officially defined by Unicode,
then normalization forms C and D will not change it; that is, any change
will be within grapheme clusters and not across their boundaries.

This does not hold for normalization forms KC and KD, which remove characters
with compatibility decompositions.

-- 
John Cowan  www.ccil.org/~cowan  www.reutershealth.com  jcowan@xxxxxxxxxxxxxxxxx
[T]here is a Darwinian explanation for the refusal to accept Darwin.
Given the very pessimistic conclusions about moral purpose to which his
theory drives us, and given the importance of a sense of moral purpose
in helping us cope with life, a refusal to believe Darwin's theory may
have important survival value. --Ian Johnston