[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: strings draft



At Fri, 23 Jan 2004 11:31:16 -0800, Bradd W. Szonye wrote:
> 
> I think to really do a good job of text handling, a procedure must know
> the language and encoding for both the source text (parameter values)
> and the context (returned values). For example, the rules for embedding
> Arabic text (right to left) in a Latin document (left to right) are
> slightly different from the converse, IIRC. This suggests an encoding
> and processing scheme where every text has an associated locale and
> every text-processing procedure has a locale context parameter. For
> convenience's sake, that information may be implicit or supplied via
> global parameters (e.g., CURRENT-LOCALE), although there are
> disadvantages to doing it that way (e.g., changing a global locale can
> cause subtle data corruption or information loss problems).

That's interesting... how are the rules different, and is it only a
matter of presentation (which would make it relevant only to output, not
input)?

Perhaps a better example is knowing whether a given string of Han
ideographs is Chinese, Japanese or Korean.  However, in this case it is
not sufficient to mark the text object itself with a locale, since you
can have mixed Chinese text within Japanese text (i.e. multiple
indistinguishable locales in the same text).  Instead it's probably
better to relegate this to a higher level library with general markup
and tagging facilities.

> 2. Use your native language, and include the locale metadata at the
>    start of the file (e.g., wrap the file with something like
> 
>        #,(LOCALE UTF-8 EN-US ( ... )))

I like this, though I still disagree on the input locale.  Perhaps:

  #,(ENCODING "UTF-8"
     ...)

> 3. Use your native language, and rely on local system conventions to
>    change the default Scheme locale.

This as I pointed out with the Turkish "i" problem is a *Bad Thing*
which I think everyone agrees we should avoid.

-- 
Alex