[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode hex representation length restrictions




On Tue, 12 Jul 2005, bear wrote:

>
>>The reason is to match existing string syntaxes, as taken up an another
>>thread.
>
>But permitting representations of 2, 4, or 8 hex digits is enough to
>match existing string syntaxes; banning 1, 3, 5, and 6 digit
>representations seems unnecessary and counterintuitive.

Okay, here I am following up to myself.  I misunderstood the
document, and now I "get" it.

After reading it again, I finally understood what you were
doing; it wasn't at all clear to me why you were using so
many different characters to escape hex sequences; to me the
different-case U's just said, "case not important" and the x
said, "and at least a couple implementations are using x, so
we better allow that too." Your intent is that the choice of
escape character signals the *length* of the following hex
sequence, yes?

This explains why you don't think you need an explicit
terminator character, and why you resisted the idea of using
other lengths; you didn't want to allocate several more
characters for escapes to cover other lengths.

You will often have to pad your character code with one or
more leading zeros, but... fair trade in typing for a
terminator, I guess.  This is an acceptable design. I don't
think it's the best from a purely tabula rasa perspective,
but if a lot of users will find it intuitive or familiar
from other languages, that's a lot of cognitive traction
that makes it "better" for the current crop of users.  I
guess there's not a really *strong* argument to make against
it.

You should, however, help prevent this kind of
misunderstanding in the future.  Please explain somewhere
other than just in the BNF grammar, in English, that the
choice of escape character determines the length of the
following hex escape.


				Bear