[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: the discussion so far

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.



bear scripsit:

> The particular example I'm thinking of is splitting strings
> between base codepoint and combining codepoint. You get two
> substrings, and the second one is syntactically invalid.

Please point to a place in the Unicode Standard where any sequence
of Unicode scalar values is said to be "syntactically invalid".

> If you print the first substring and then the second, the
> combining codepoint is usually printed as though it modified
> a space character that isn't actually there.

That's one possibility; it can also be rendered on top of
a dotted-circle, which is what is done in the Unicode charts.
In any case, glyph rendering is not part of the Standard.

> If something
> normalizes the substrings first, the space may actually be
> added, although it wasn't present in the original string.

That turns out not to be the case.  The normalized form of
a string consisting of one combining character is itself.

> Gah.  Encodings, normalization forms, endianness, and all the
> rest of it.  When you want to write a "character" any of a dozen
> things can happen.

Blurring significant distinctions that have taken a long time to
nail down isn't very conducive to clear thinking.

-- 
Not to perambulate                 John Cowan <jcowan@xxxxxxxxxxxxxxxxx>    
    the corridors                  http://www.reutershealth.com
during the hours of repose         http://www.ccil.org/~cowan
    in the boots of ascension.       --Sign in Austrian ski-resort hotel