This page is part of the web mail archives of SRFI 50 from before July 7th, 2015. The new archives for SRFI 50 are here. Eventually, the entire history will be moved there, including any new messages.
> From: Shiro Kawai <shiro@xxxxxxxx> > Thanks for the detailed reply. Now I'm getting the point. > * An implementation are free to have non-Unicode-compatible > char/string, as far as it shares the mimimum requirement, > which is not much more than current R5RS with some > clearification (case mapping issues aside). Right. > * _If_ an implementation can also have a subset of Unicode- > compatible char/string, this subset of char/string should > follow the codepoint-index. The index handling of the rest > of char/string is up to the implementation. Right. > Did I get it right? So far. > So, when the EUCJP Scheme reads a string > "\U+30AB.\U+309A." > Then it can produce a string which consists of a single characetr > EUCJP #xA5F7. Eh... no. The final language should be such that that string constant denotes a string of two Unicode codepoints. I'm typing in ASCII here but let's pretend that ``<#xa5F7>'' is the literal (not numeric escape) way to write that EUCJP character. Then an implementation (such as an EUCJP implementation) is free to have: (string-length "<#xa5F7>") => 1 and another implementation (such as a Unicode implementation) is free to have: (string-length "<#xa5F7>") => 2 but all implementations must either refuse to read "\U+30AB.\U+309A." or have (string-length "\U+30AB.\U+309A.") => 2 This area is a little touchy, even for Unicode-supporting implementations. The same literal string might wind up in two different canonicalization forms in Unicode, resulting in different lengths and indexes. The same literal string might wind up as length 1 in Bears's implementation, and length N>1 in other implementations. Those "touchy" issues are things I'd expect to be touched up by supplementary standards such as SRFIs. There could be a "Canonicalization form D Standards for Scheme" that Bear's wouldn't conform to but Pika does. Another for "EUCJP Standards for Scheme that Guache would conform to but not Bear's. Etc. We can take some time to see what's winning at that level and then duke it out for R9RS :-) > It is outside of the scope of your document, > so the implementation is free to imlement such as > (define x "\U+30AB.\U+309A.") > (string-length x) => 1 > (string-ref x 1) => <character EUCJP #xA5F7> > (let ((y (string-copy x))) > (string-set! y 0 #\a) > y) => "a" > If so, I have no problem to adopt the "codepoint index" proposal. Well, how about if I agree to every bit of that except for the syntax you used for the string constant? > [About O(1) property] (I'll reply to the rest later.) -t