[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Encodings.



> Bradd, what prompted my comments were your own following comments:
>> ... Storing data in non-canonical form is not "broken." Also, there's
>> more than one canonical form. ... Programs which disagree on the form
>> of the I/O will need to translate between the two. ... That wouldn't
>> help unless they agree to write the *same* canonical format. ...

Paul Schlie wrote:
> And I agree that neither programs, platforms, nor even users can often
> agree on the use of any single encoding form; therefore it would seem
> that it then becomes the obligation of the programming language to
> enable the specification of program code which can access and process
> data encoded in arbitrary forms.

I strongly disagree. Compilers have traditionally required all source
input to have a particular encoding, usually the system's "native"
character encoding, and it works well. However, if the compiler chooses
a flavor of Unicode (e.g., UTF-8) as its source encoding, then it should
implement the Unicode standard correctly, which (IIRC) includes the
requirement that you handle various normal and non-normal forms
gracefully.

> So therefore I can't see how you can conclude that adopting a standard
> encoding specification for text (or any data of any type for that
> matter) accomplishes anything other than preventing that programming
> language from being able to access and manipulate data stored in other
> formats, which you seem to have recognized the necessity for?

Huh? I don't recall insisting that compilers *must* recognize only a
single source encoding. Indeed, I've suggested ways to portably specify
the input encoding in source code (in XML style). However, I also think
the traditional "only recognize one source encoding" compilers are also
fine. In other words, flexible source encoding is a desirable but
optional feature.

But again, if you choose a flavor of Unicode for your sources, then you
should be prepared to handle the many normal and non-normal forms of
Unicode graphemes.

> I don't believe that scheme's intent was to be restricted to only
> being able to natively access and process text, much less only text
> encoded in any particular format, do you?

No. But it sounds like you *think* I do. Again, it sounds like you have
an axe to grind and are reading too much into my words.

> (I honestly think you're viewing scheme and it's potential
> applicability within the broader computing industry, and it's
> corresponding practical requirements too narrowly, with no disrespect
> intended.

I have no idea where you got this impression.

> Maybe I'm missing the boat, but from the best I can tell, all
> discussions seem to be leading to the erroneous presumption that it's
> adequate for scheme to restrict itself to exclusively processing data
> originating, and destined as Unicode encoded text, which would be most
> unfortunate.)

And I've spoken out vehemently *against* this on comp.lang.scheme.
-- 
Bradd W. Szonye
http://www.szonye.com/bradd