[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Encodings.

This page is part of the web mail archives of SRFI 52 from before July 7th, 2015. The new archives for SRFI 52 contain all messages, not just those from before July 7th, 2015.

> Bradd, what prompted my comments were your own following comments:
>> ... Storing data in non-canonical form is not "broken." Also, there's
>> more than one canonical form. ... Programs which disagree on the form
>> of the I/O will need to translate between the two. ... That wouldn't
>> help unless they agree to write the *same* canonical format. ...

Paul Schlie wrote:
> And I agree that neither programs, platforms, nor even users can often
> agree on the use of any single encoding form; therefore it would seem
> that it then becomes the obligation of the programming language to
> enable the specification of program code which can access and process
> data encoded in arbitrary forms.

I strongly disagree. Compilers have traditionally required all source
input to have a particular encoding, usually the system's "native"
character encoding, and it works well. However, if the compiler chooses
a flavor of Unicode (e.g., UTF-8) as its source encoding, then it should
implement the Unicode standard correctly, which (IIRC) includes the
requirement that you handle various normal and non-normal forms

> So therefore I can't see how you can conclude that adopting a standard
> encoding specification for text (or any data of any type for that
> matter) accomplishes anything other than preventing that programming
> language from being able to access and manipulate data stored in other
> formats, which you seem to have recognized the necessity for?

Huh? I don't recall insisting that compilers *must* recognize only a
single source encoding. Indeed, I've suggested ways to portably specify
the input encoding in source code (in XML style). However, I also think
the traditional "only recognize one source encoding" compilers are also
fine. In other words, flexible source encoding is a desirable but
optional feature.

But again, if you choose a flavor of Unicode for your sources, then you
should be prepared to handle the many normal and non-normal forms of
Unicode graphemes.

> I don't believe that scheme's intent was to be restricted to only
> being able to natively access and process text, much less only text
> encoded in any particular format, do you?

No. But it sounds like you *think* I do. Again, it sounds like you have
an axe to grind and are reading too much into my words.

> (I honestly think you're viewing scheme and it's potential
> applicability within the broader computing industry, and it's
> corresponding practical requirements too narrowly, with no disrespect
> intended.

I have no idea where you got this impression.

> Maybe I'm missing the boat, but from the best I can tell, all
> discussions seem to be leading to the erroneous presumption that it's
> adequate for scheme to restrict itself to exclusively processing data
> originating, and destined as Unicode encoded text, which would be most
> unfortunate.)

And I've spoken out vehemently *against* this on comp.lang.scheme.
Bradd W. Szonye