[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: finishing output translating stream



From: Michael Sperber <sperber@xxxxxxxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: finishing output translating stream
Date: Wed, 08 Jun 2005 09:52:01 +0200

> I think I'm beginning to get it---by character conversion buffer, you
> mean stuff left in the "state" variable kept around by the stream
> translation layer, right?

Yes.

> Now, I can see how that may happen in theory---but in practice, aren't
> you likely to always translate as far as possible into the data handed
> to you?  (Maybe this should be mentioned in the SRFI.)   Because if
> you do, then knowing about a flush is not going to help you if you
> don't get more data along with it (which you don't currently).

In the example like the string packet, I'd think "flushing"
marks that the string is self-contained by that point, so that
the translator can (or should) reset its state to a neutral
position.

Example: suppose my application writes out a log file.  I'm
a bit sloppy so the application just opens the log file and
seeks to the end at the startup time, and keep the file open
during execution.  The log messages are written out occasionary.

For some reason, I need to store the log file in ISO-2022-JP
format.  It is a stateful encoding, so when non-ascii character
sequence appears an escape sequence ESC '$' 'B' is written out,
and when non-ascii character sequence ends another escape sequence
ESC '(' 'B' is written out to mark the state is back to ascii.

It may be possible that the application gets killed without
a chance to cleanup, so I call flush-output-stream every time
after the log message, hoping as much logs remains on the disk
even in the case of accident.

If the translate-proc doesn't know about flush-output-stream,
and a log message happen to end with a non-ascii character,
it won't write out the closing escape sequence ESC '(' 'B' 
at the end of log message, for it doesn't know if more non-ascii
character is coming or not.   If the application crashes then,
the log file remains in the non-ascii state.  Subsequent run
of the applicaion starts adding messages, assuming the file begins
with ascii state---resulting that the first ascii portion of
the new message becomes illegible.

I admit this is just a bad design.  Still, the character encoding
stuff is so complicated that I appreciate anything that adds
more certainty and control.

--shiro