[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Encodings.

This page is part of the web mail archives of SRFI 52 from before July 7th, 2015. The new archives for SRFI 52 contain all messages, not just those from before July 7th, 2015.



Paul Schlie wrote:
> Just a nit, but a text file is a binary file ....

No, that is not true on all systems. There are systems where you simply
cannot access a text file as a "binary file" (a stream of bytes).

> More sophisticated indexed record file structures supported by some
> os's are themselves not much more than an intermediate primitive
> indexed data base built on top of plan old files composed of disk
> sectors, often with the knowledge of the storage systems blocking
> architecture for efficiency; but the general rule remains, you get out
> literally what you put in, as otherwise they wouldn't be very useful.

Yes, you get out what you put in. However, when a system provides no
"binary mode" or "stream mode" for text files, there is no way to
implement a text port in terms of a binary port.

To put it another way, a "binary file" is just a specific way of
organizing data on disk, designed so that you can access individual
bytes or words as a stream or in random order. UNIX systems store text
files that way, such that binary files and text files are actually the
same thing. On that kind of system, you can implement text ports on top
of binary ports.

But some systems do not store text files that way. There simply are no
operating system primitives to access them as a stream of bytes, and it
would not make sense to implement text ports in terms of binary ports.
That's why C has separate text and binary I/O modes. On UNIX-like,
ASCII-based systems, the two modes are identical, because "transforming"
binary data to text is a no-op. On DOS-like systems, text mode is
implemented as a filter on top of binary mode, like you suggest. (That's
also how it works on UNIX-like systems that use multibyte character
sets.) But on systems where text and binary data are fundamentally
different, the two I/O modes are independent.

While they're both ultimately bits on a spinning magnet, the interface
to those bits is often very different depending on the type of data. At
one time, the UNIX-like systems were the exception, not the rule. Now
it's the other way around, but the byte-stream model is still not
universal. Unless you really want to limit all I/O to that model
(i.e., make it impossible to implement Scheme on some systems), you
cannot design the system so that text I/O is a stream layer atop binary
I/O.
-- 
Bradd W. Szonye
http://www.szonye.com/bradd