[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: strings draft

This page is part of the web mail archives of SRFI 50 from before July 7th, 2015. The new archives for SRFI 50 contain all messages, not just those from before July 7th, 2015.

Sorry it took me so long to answer this.

> At Fri, 23 Jan 2004 11:31:16 -0800, Bradd W. Szonye wrote:
> > I think to really do a good job of text handling, a procedure must know
> > the language and encoding for both the source text (parameter values)
> > and the context (returned values). For example, the rules for embedding
> > Arabic text (right to left) in a Latin document (left to right) are
> > slightly different from the converse, IIRC. This suggests an encoding
> > and processing scheme where every text has an associated locale and
> > every text-processing procedure has a locale context parameter. For
> > convenience's sake, that information may be implicit or supplied via
> > global parameters (e.g., CURRENT-LOCALE), although there are
> > disadvantages to doing it that way (e.g., changing a global locale can
> > cause subtle data corruption or information loss problems).

On Mon, Jan 26, 2004 at 11:21:34AM +0900, Alex Shinn wrote:
> That's interesting... how are the rules different, and is it only a
> matter of presentation (which would make it relevant only to output,
> not input)?

It might be presentation-only. Or in other words, in the future I should
probably refresh my memory before shooting off at the mouth.

> Perhaps a better example is knowing whether a given string of Han
> ideographs is Chinese, Japanese or Korean.  However, in this case it
> is not sufficient to mark the text object itself with a locale, since
> you can have mixed Chinese text within Japanese text (i.e. multiple
> indistinguishable locales in the same text).  Instead it's probably
> better to relegate this to a higher level library with general markup
> and tagging facilities.

Yeah, probably, although for the general case note that you still need
to know the locale of all the inputs *and* the context they're going

> > 2. Use your native language, and include the locale metadata at the
> >    start of the file (e.g., wrap the file with something like
> > 
> >        #,(LOCALE UTF-8 EN-US ( ... )))

> I like this, though I still disagree on the input locale.  Perhaps:
>   #,(ENCODING "UTF-8"
>      ...)

I don't understand what you're getting at here. I must've missed
something earlier in the conversation.

> > 3. Use your native language, and rely on local system conventions to
> >    change the default Scheme locale.
> This as I pointed out with the Turkish "i" problem is a *Bad Thing*
> which I think everyone agrees we should avoid.

I suspect that you misunderstood the point of this feature. It's for
people who don't care about internationalization; they just want the
compiler to recognize local conventions when processing comments,
identifier names, etc. For example, some Turkish programmer writes a
program in Scheme + Turkish, he sets his Scheme compiler to "Turkish"
mode, and it just works. Of course, it won't work if he e-mails the
program to his friend in Germany, but he doesn't care, because he
doesn't plan to mail it around the world.

The standard XML-style prologue is what you use when you want to use a
local encoding/language *and* make it portable. If you don't care about
portability, just flip the compiler switch that says, "Follow my local
conventions, instead of English or language-neutral conventions."
Bradd W. Szonye