[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Mixing characters and bytes



Michael Sperber <sperber@xxxxxxxxxxxxxxxxxxxxxxxxxxx> writes:

> Per Bothner <per@xxxxxxxxxxx> writes:

>> * Switching encodings: You can't switch encodings without switching
>> ports.
>
> Fair enough.  I had trouble figuring it out how to make it work, which
> is one reason why I punted on it.  (It's trickier than what you
> suggest.)  But I believe I've found a way to do it, which I hope I can
> incorporate in the next revision.

Turns out I was wrong.  Switching encodings in the middle of a
buffered data stream (in the general sense) is, AFAICS, very costly:
You generally want to transcode text in chunks for efficiency.  This
means that you'll typically transcode ahead of what the program has
actually requested.  Now, switching encodings means going back to the
place where you actually stopped requesting data, which means
retracing your steps from the beginning of the last transcoding step.
This would complicate the interface for defining translators
considerably, and still leaves some border cases uncovered.  (When
you're only retrieved data for parts of a single character from the
stream.)  Moreover, it would even more significantly complicate
implementations of input ports that use only a single buffer.

The implementation side doesn't bother me as much as the discontinuity
in an operation which looks lightweight.  (Even I, dufus that I am,
thought so until I tried to implement it.)  If anyone can suggest how
to do it in a straightforward manner, I'm all ears.  Until then I'll
have to punt on this, unfortunately.

-- 
Cheers =8-} Mike
Friede, Völkerverständigung und überhaupt blabla