This page is part of the web mail archives of SRFI 50 from before July 7th, 2015. The new archives for SRFI 50 contain all messages, not just those from before July 7th, 2015.
From: Tom Lord <lord@xxxxxxx> Subject: Re: strings draft Date: Fri, 23 Jan 2004 20:31:32 -0800 (PST) > > So, when the EUCJP Scheme reads a string > > > "\U+30AB.\U+309A." > > > Then it can produce a string which consists of a single characetr > > EUCJP #xA5F7. > > Eh... no. The final language should be such that that string > constant denotes a string of two Unicode codepoints. [...] > but all implementations must either refuse to read > > "\U+30AB.\U+309A." > > or have > > (string-length "\U+30AB.\U+309A.") => 2 I see. I think it's reasonable and acceptable. EUCJP implementation can inform the user that it can't read the constant. There are a couple of edge cases that I'd like to be clearer. Can it map U+30AB to EUCJP #xA5AB, and U+309A to some alternative character that designates unrecognized character? (U+3013 is used in Japan traditionally). It'll satisfy codepoint index requirements. Though (string-ref "\U+30AB.\U+309A." 1) would be a surprise. This can be either way---if it's not allowed in the proposal, I can provide a flag so the implementation can behave either "strictly conforming Unicode API" or "loose mode". Another edge case. Suppose U+30AB and U+309A codepoints are written directly (without escaping) in the source code. EUCJP implementation can still load such a file, if it is informed that the source is in one of Unicode CES. It will convert those two codepoints into one EUCJP #xA5AB character during reading, so it'll produce a string of one character. Is it an out of scope of the Unicode API? > > If so, I have no problem to adopt the "codepoint index" proposal. > > Well, how about if I agree to every bit of that except for the syntax > you used for the string constant? I can agree with the "codepoint index" proposal, given the above points are clearified. It became much clear to me anyway. Thanks. --shiro