[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: strings draft

This page is part of the web mail archives of SRFI 50 from before July 7th, 2015. The new archives for SRFI 50 contain all messages, not just those from before July 7th, 2015.

At Fri, 23 Jan 2004 11:31:16 -0800, Bradd W. Szonye wrote:
> I think to really do a good job of text handling, a procedure must know
> the language and encoding for both the source text (parameter values)
> and the context (returned values). For example, the rules for embedding
> Arabic text (right to left) in a Latin document (left to right) are
> slightly different from the converse, IIRC. This suggests an encoding
> and processing scheme where every text has an associated locale and
> every text-processing procedure has a locale context parameter. For
> convenience's sake, that information may be implicit or supplied via
> global parameters (e.g., CURRENT-LOCALE), although there are
> disadvantages to doing it that way (e.g., changing a global locale can
> cause subtle data corruption or information loss problems).

That's interesting... how are the rules different, and is it only a
matter of presentation (which would make it relevant only to output, not

Perhaps a better example is knowing whether a given string of Han
ideographs is Chinese, Japanese or Korean.  However, in this case it is
not sufficient to mark the text object itself with a locale, since you
can have mixed Chinese text within Japanese text (i.e. multiple
indistinguishable locales in the same text).  Instead it's probably
better to relegate this to a higher level library with general markup
and tagging facilities.

> 2. Use your native language, and include the locale metadata at the
>    start of the file (e.g., wrap the file with something like
>        #,(LOCALE UTF-8 EN-US ( ... )))

I like this, though I still disagree on the input locale.  Perhaps:


> 3. Use your native language, and rely on local system conventions to
>    change the default Scheme locale.

This as I pointed out with the Turkish "i" problem is a *Bad Thing*
which I think everyone agrees we should avoid.