[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: the discussion so far
bear scripsit:
> The particular example I'm thinking of is splitting strings
> between base codepoint and combining codepoint. You get two
> substrings, and the second one is syntactically invalid.
Please point to a place in the Unicode Standard where any sequence
of Unicode scalar values is said to be "syntactically invalid".
> If you print the first substring and then the second, the
> combining codepoint is usually printed as though it modified
> a space character that isn't actually there.
That's one possibility; it can also be rendered on top of
a dotted-circle, which is what is done in the Unicode charts.
In any case, glyph rendering is not part of the Standard.
> If something
> normalizes the substrings first, the space may actually be
> added, although it wasn't present in the original string.
That turns out not to be the case. The normalized form of
a string consisting of one combining character is itself.
> Gah. Encodings, normalization forms, endianness, and all the
> rest of it. When you want to write a "character" any of a dozen
> things can happen.
Blurring significant distinctions that have taken a long time to
nail down isn't very conducive to clear thinking.
--
Not to perambulate John Cowan <jcowan@xxxxxxxxxxxxxxxxx>
the corridors http://www.reutershealth.com
during the hours of repose http://www.ccil.org/~cowan
in the boots of ascension. --Sign in Austrian ski-resort hotel