[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: strings draft

This page is part of the web mail archives of SRFI 50 from before July 7th, 2015. The new archives for SRFI 50 contain all messages, not just those from before July 7th, 2015.

    > From: Shiro Kawai <shiro@xxxxxxxx>

    > Thanks for the detailed reply.  Now I'm getting the point.

    >  * An implementation are free to have non-Unicode-compatible
    >    char/string, as far as it shares the mimimum requirement,
    >    which is not much more than current R5RS with some
    >    clearification (case mapping issues aside).


    >  * _If_ an implementation can also have a subset of Unicode-
    >    compatible char/string, this subset of char/string should
    >    follow the codepoint-index.  The index handling of the rest
    >    of char/string is up to the implementation.


    > Did I get it right?

So far.

    > So, when the EUCJP Scheme reads a string

    >  "\U+30AB.\U+309A."

    > Then it can produce a string which consists of a single characetr
    > EUCJP #xA5F7.  

Eh... no.   The final language should be such that that string
constant denotes a string of two Unicode codepoints.

I'm typing in ASCII here but let's pretend that ``<#xa5F7>'' is the
literal (not numeric escape) way to write that EUCJP character.

Then an implementation (such as an EUCJP implementation) is free to

	(string-length "<#xa5F7>") => 1

and another implementation (such as a Unicode implementation) is free
to have:

	(string-length "<#xa5F7>") => 2

but all implementations must either refuse to read


or have

	(string-length "\U+30AB.\U+309A.") => 2

This area is a little touchy, even for Unicode-supporting
implementations.   The same literal string might wind up in two
different canonicalization forms in Unicode, resulting in different
lengths and indexes.   The same literal string might wind up as length
1 in Bears's implementation, and length N>1 in other implementations.

Those "touchy" issues are things I'd expect to be touched up by
supplementary standards such as SRFIs.  There could be a
"Canonicalization form D Standards for Scheme" that Bear's wouldn't
conform to but Pika does.  Another for "EUCJP Standards for Scheme
that Guache would conform to but not Bear's.  Etc.  We can take some
time to see what's winning at that level and then duke it out for
R9RS :-)

    > It is outside of the scope of your document,
    > so the implementation is free to imlement such as

    >  (define x "\U+30AB.\U+309A.")
    >  (string-length x) => 1
    >  (string-ref x 1)  => <character EUCJP #xA5F7>
    >  (let ((y (string-copy x)))
    >    (string-set! y 0 #\a)
    >    y) => "a"

    > If so, I have no problem to adopt the "codepoint index" proposal.

Well, how about if I agree to every bit of that except for the syntax
you used for the string constant?

    > [About O(1) property]

(I'll reply to the rest later.)