[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Encodings.

This page is part of the web mail archives of SRFI 52 from before July 7th, 2015. The new archives for SRFI 52 contain all messages, not just those from before July 7th, 2015.



From the best I can tell, there is no difference between opening a file
using C's fopen function in binary mode or text mode, with the exception
Of local conversion of new-line marker character(s) in VMS, MS-whatever,
UNIX, MAC, etc.; both can be read/written logically sequentially, (although
files opened in text mode are apparently permitted to strip or replace
non-locally displayable and white-space characters which may exist prior to
a new-line marker, and limit the number of characters between new-line
sequences; which likely limits C's text mode file functions for storage of
Unicode encoded text in the strictest case, as well as prevents the ability
to properly read foreign text files properly if their local new-line
conventions differ from the local host's conventions.

Therefore (although I know folks hate my "therefore"s), scheme
implementations which expect to leverage C's file function library,
should likely open files in binary mode, and apply the local accepted
new-line etc. conventions themselves, thereby enabling programs to
be written capable of opening arbitrary formatted files, including
foreign text files which have adopted different new-line encoding
conventions than that of the local platform. (it's a bit more work,
but it's the general solution, as otherwise it's restricted to only
being able to process text files consistent with the platform's local
formatting conventions.

-paul-

> From: "Bradd W. Szonye" <bradd+srfi@xxxxxxxxxx>
> 
> Paul Schlie wrote:
>> Just a nit, but a text file is a binary file ....
> 
> No, that is not true on all systems. There are systems where you simply
> cannot access a text file as a "binary file" (a stream of bytes).
> 
>> More sophisticated indexed record file structures supported by some
>> os's are themselves not much more than an intermediate primitive
>> indexed data base built on top of plan old files composed of disk
>> sectors, often with the knowledge of the storage systems blocking
>> architecture for efficiency; but the general rule remains, you get out
>> literally what you put in, as otherwise they wouldn't be very useful.
> 
> Yes, you get out what you put in. However, when a system provides no
> "binary mode" or "stream mode" for text files, there is no way to
> implement a text port in terms of a binary port.
> 
> To put it another way, a "binary file" is just a specific way of
> organizing data on disk, designed so that you can access individual
> bytes or words as a stream or in random order. UNIX systems store text
> files that way, such that binary files and text files are actually the
> same thing. On that kind of system, you can implement text ports on top
> of binary ports.
> 
> But some systems do not store text files that way. There simply are no
> operating system primitives to access them as a stream of bytes, and it
> would not make sense to implement text ports in terms of binary ports.
> That's why C has separate text and binary I/O modes. On UNIX-like,
> ASCII-based systems, the two modes are identical, because "transforming"
> binary data to text is a no-op. On DOS-like systems, text mode is
> implemented as a filter on top of binary mode, like you suggest. (That's
> also how it works on UNIX-like systems that use multibyte character
> sets.) But on systems where text and binary data are fundamentally
> different, the two I/O modes are independent.
> 
> While they're both ultimately bits on a spinning magnet, the interface
> to those bits is often very different depending on the type of data. At
> one time, the UNIX-like systems were the exception, not the rule. Now
> it's the other way around, but the byte-stream model is still not
> universal. Unless you really want to limit all I/O to that model
> (i.e., make it impossible to implement Scheme on some systems), you
> cannot design the system so that text I/O is a stream layer atop binary
> I/O.
> -- 
> Bradd W. Szonye
> http://www.szonye.com/bradd
>