[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Encodings.

This page is part of the web mail archives of SRFI 52 from before July 7th, 2015. The new archives for SRFI 52 contain all messages, not just those from before July 7th, 2015.



> On Thursday 12 February 2004 06:45 pm, bear wrote:
>> You're missing all the tools and utilities out there that are
>> programmed with the expectation and requirement that they can
>> arbitrarily impose or change normalization forms without changing the
>> text of the documents they handle.  There is no escaping this; even
>> Emacs and Notepad do it.

On Thu, Feb 12, 2004 at 02:10:18PM +0100, Ken Dickey wrote:
> Ah!  So a broken language (huge tables and complex processing) must be
> defined to deal with broken tools which do not write out Unicode data
> in a canonical format.

Storing data in non-canonical form is not "broken." Also, there's more
than one canonical form. The "C" forms compose characters into the
smallest number of code-points possible. The "D" forms decompose them
into fully-general base+combining forms. Programs which disagree on the
form of the I/O will need to translate between the two.

> What about creating a tool which reads bizarre Unicode and writes it
> out in a canonical format?  Then requiring portable Scheme programs to
> pass through it?  

That wouldn't help unless they agree to write the *same* canonical
format. Besides, this is just separating part of the reader's job into
an external program, and in an error-prone way.

> Sounds like a service to the entire Unicode community.  It could be
> written in portable Scheme and serve as a (presumably good)
> advertisement for the language.

Not really.
-- 
Bradd W. Szonye
http://www.szonye.com/bradd