[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: strings draft
Sorry it took me so long to answer this.
> At Fri, 23 Jan 2004 11:31:16 -0800, Bradd W. Szonye wrote:
> > I think to really do a good job of text handling, a procedure must know
> > the language and encoding for both the source text (parameter values)
> > and the context (returned values). For example, the rules for embedding
> > Arabic text (right to left) in a Latin document (left to right) are
> > slightly different from the converse, IIRC. This suggests an encoding
> > and processing scheme where every text has an associated locale and
> > every text-processing procedure has a locale context parameter. For
> > convenience's sake, that information may be implicit or supplied via
> > global parameters (e.g., CURRENT-LOCALE), although there are
> > disadvantages to doing it that way (e.g., changing a global locale can
> > cause subtle data corruption or information loss problems).
On Mon, Jan 26, 2004 at 11:21:34AM +0900, Alex Shinn wrote:
> That's interesting... how are the rules different, and is it only a
> matter of presentation (which would make it relevant only to output,
> not input)?
It might be presentation-only. Or in other words, in the future I should
probably refresh my memory before shooting off at the mouth.
> Perhaps a better example is knowing whether a given string of Han
> ideographs is Chinese, Japanese or Korean. However, in this case it
> is not sufficient to mark the text object itself with a locale, since
> you can have mixed Chinese text within Japanese text (i.e. multiple
> indistinguishable locales in the same text). Instead it's probably
> better to relegate this to a higher level library with general markup
> and tagging facilities.
Yeah, probably, although for the general case note that you still need
to know the locale of all the inputs *and* the context they're going
into.
> > 2. Use your native language, and include the locale metadata at the
> > start of the file (e.g., wrap the file with something like
> >
> > #,(LOCALE UTF-8 EN-US ( ... )))
> I like this, though I still disagree on the input locale. Perhaps:
>
> #,(ENCODING "UTF-8"
> ...)
I don't understand what you're getting at here. I must've missed
something earlier in the conversation.
> > 3. Use your native language, and rely on local system conventions to
> > change the default Scheme locale.
>
> This as I pointed out with the Turkish "i" problem is a *Bad Thing*
> which I think everyone agrees we should avoid.
I suspect that you misunderstood the point of this feature. It's for
people who don't care about internationalization; they just want the
compiler to recognize local conventions when processing comments,
identifier names, etc. For example, some Turkish programmer writes a
program in Scheme + Turkish, he sets his Scheme compiler to "Turkish"
mode, and it just works. Of course, it won't work if he e-mails the
program to his friend in Germany, but he doesn't care, because he
doesn't plan to mail it around the world.
The standard XML-style prologue is what you use when you want to use a
local encoding/language *and* make it portable. If you don't care about
portability, just flip the compiler switch that says, "Follow my local
conventions, instead of English or language-neutral conventions."
--
Bradd W. Szonye
http://www.szonye.com/bradd