[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Encodings.
> Bradd, what prompted my comments were your own following comments:
>> ... Storing data in non-canonical form is not "broken." Also, there's
>> more than one canonical form. ... Programs which disagree on the form
>> of the I/O will need to translate between the two. ... That wouldn't
>> help unless they agree to write the *same* canonical format. ...
Paul Schlie wrote:
> And I agree that neither programs, platforms, nor even users can often
> agree on the use of any single encoding form; therefore it would seem
> that it then becomes the obligation of the programming language to
> enable the specification of program code which can access and process
> data encoded in arbitrary forms.
I strongly disagree. Compilers have traditionally required all source
input to have a particular encoding, usually the system's "native"
character encoding, and it works well. However, if the compiler chooses
a flavor of Unicode (e.g., UTF-8) as its source encoding, then it should
implement the Unicode standard correctly, which (IIRC) includes the
requirement that you handle various normal and non-normal forms
gracefully.
> So therefore I can't see how you can conclude that adopting a standard
> encoding specification for text (or any data of any type for that
> matter) accomplishes anything other than preventing that programming
> language from being able to access and manipulate data stored in other
> formats, which you seem to have recognized the necessity for?
Huh? I don't recall insisting that compilers *must* recognize only a
single source encoding. Indeed, I've suggested ways to portably specify
the input encoding in source code (in XML style). However, I also think
the traditional "only recognize one source encoding" compilers are also
fine. In other words, flexible source encoding is a desirable but
optional feature.
But again, if you choose a flavor of Unicode for your sources, then you
should be prepared to handle the many normal and non-normal forms of
Unicode graphemes.
> I don't believe that scheme's intent was to be restricted to only
> being able to natively access and process text, much less only text
> encoded in any particular format, do you?
No. But it sounds like you *think* I do. Again, it sounds like you have
an axe to grind and are reading too much into my words.
> (I honestly think you're viewing scheme and it's potential
> applicability within the broader computing industry, and it's
> corresponding practical requirements too narrowly, with no disrespect
> intended.
I have no idea where you got this impression.
> Maybe I'm missing the boat, but from the best I can tell, all
> discussions seem to be leading to the erroneous presumption that it's
> adequate for scheme to restrict itself to exclusively processing data
> originating, and destined as Unicode encoded text, which would be most
> unfortunate.)
And I've spoken out vehemently *against* this on comp.lang.scheme.
--
Bradd W. Szonye
http://www.szonye.com/bradd