[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: strings draft

This page is part of the web mail archives of SRFI 50 from before July 7th, 2015. The new archives for SRFI 50 contain all messages, not just those from before July 7th, 2015.

To: lord@xxxxxxx
Subject: Re: strings draft
From: Shiro Kawai <shiro@xxxxxxxx>
Date: Fri, 23 Jan 2004 18:49:07 -1000 (HST)
Cc: srfi-50@xxxxxxxxxxxxxxxxx
Delivered-to: srfi-50@xxxxxxxxxxxxxxxxx
In-reply-to: <200401240431.UAA28992@xxxxxxxxxxxxxxxxxxxxxxx>
References: <200401240045.QAA28248@xxxxxxxxxxxxxxxxxxxxxxx> <20040123.172656.899859146.shiro@xxxxxxxx> <200401240431.UAA28992@xxxxxxxxxxxxxxxxxxxxxxx>

From: Tom Lord <lord@xxxxxxx>
Subject: Re: strings draft
Date: Fri, 23 Jan 2004 20:31:32 -0800 (PST)

>     > So, when the EUCJP Scheme reads a string
> 
>     >  "\U+30AB.\U+309A."
> 
>     > Then it can produce a string which consists of a single characetr
>     > EUCJP #xA5F7.  
> 
> Eh... no.   The final language should be such that that string
> constant denotes a string of two Unicode codepoints.
[...]
> but all implementations must either refuse to read
> 
> 	"\U+30AB.\U+309A."
> 
> or have
> 
> 	(string-length "\U+30AB.\U+309A.") => 2

I see.  I think it's reasonable and acceptable.   EUCJP
implementation can inform the user that it can't read the constant.  

There are a couple of edge cases that I'd like to be clearer.

Can it map U+30AB to EUCJP #xA5AB, and U+309A to some
alternative character that designates unrecognized character?
(U+3013 is used in Japan traditionally).   It'll satisfy
codepoint index requirements.  Though
(string-ref "\U+30AB.\U+309A." 1) would be a surprise.

This can be either way---if it's not allowed in the proposal,
I can provide a flag so the implementation can behave either
"strictly conforming Unicode API" or "loose mode".

Another edge case.  Suppose U+30AB and U+309A codepoints are
written directly (without escaping) in the source code.
EUCJP implementation can still load such a file, if it is informed
that the source is in one of Unicode CES.   It will convert
those two codepoints into one EUCJP #xA5AB character during
reading, so it'll produce a string of one character.
Is it an out of scope of the Unicode API?

>     > If so, I have no problem to adopt the "codepoint index" proposal.
> 
> Well, how about if I agree to every bit of that except for the syntax
> you used for the string constant?

I can agree with the "codepoint index" proposal, given the above
points are clearified.
It became much clear to me anyway.  Thanks.

--shiro

Follow-Ups:
- Re: strings draft
  - From: Tom Lord

References:
- Re: strings draft
  - From: Tom Lord
- Re: strings draft
  - From: Shiro Kawai
- Re: strings draft
  - From: Tom Lord

Prev by Date: Re: strings draft
Next by Date: Re: Parsing Scheme [was Re: strings draft]
Previous by thread: Re: strings draft
Next by thread: Re: strings draft
Index(es):
- Date
- Thread