[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Mixing characters and bytes

This page is part of the web mail archives of SRFI 68 from before July 7th, 2015. The new archives for SRFI 68 contain all messages, not just those from before July 7th, 2015.

To: srfi-68@xxxxxxxxxxxxxxxxx
Subject: Re: Mixing characters and bytes
From: Michael Sperber <sperber@xxxxxxxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 13 Sep 2005 14:19:19 +0200
Delivered-to: srfi-68@xxxxxxxxxxxxxxxxx
In-reply-to: <y9lfysyc624.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> (Michael Sperber's message of "Thu, 25 Aug 2005 18:52:35 +0200")
References: <430C20B1.6010102@xxxxxxxxxxx> <y9ly86rjl1x.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <430CBF89.1070500@xxxxxxxxxxx> <y9lfysyc624.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
User-agent: Gnus/5.110003 (No Gnus v0.3) XEmacs/21.5-b21 (berkeley-unix)

Michael Sperber <sperber@xxxxxxxxxxxxxxxxxxxxxxxxxxx> writes:

> Per Bothner <per@xxxxxxxxxxx> writes:

>> * Switching encodings: You can't switch encodings without switching
>> ports.
>
> Fair enough.  I had trouble figuring it out how to make it work, which
> is one reason why I punted on it.  (It's trickier than what you
> suggest.)  But I believe I've found a way to do it, which I hope I can
> incorporate in the next revision.

Turns out I was wrong.  Switching encodings in the middle of a
buffered data stream (in the general sense) is, AFAICS, very costly:
You generally want to transcode text in chunks for efficiency.  This
means that you'll typically transcode ahead of what the program has
actually requested.  Now, switching encodings means going back to the
place where you actually stopped requesting data, which means
retracing your steps from the beginning of the last transcoding step.
This would complicate the interface for defining translators
considerably, and still leaves some border cases uncovered.  (When
you're only retrieved data for parts of a single character from the
stream.)  Moreover, it would even more significantly complicate
implementations of input ports that use only a single buffer.

The implementation side doesn't bother me as much as the discontinuity
in an operation which looks lightweight.  (Even I, dufus that I am,
thought so until I tried to implement it.)  If anyone can suggest how
to do it in a straightforward manner, I'm all ears.  Until then I'll
have to punt on this, unfortunately.

-- 
Cheers =8-} Mike
Friede, Völkerverständigung und überhaupt blabla

Follow-Ups:
- Re: Mixing characters and bytes
  - From: Shiro Kawai

References:
- Mixing characters and bytes
  - From: Per Bothner
- Re: Mixing characters and bytes
  - From: Michael Sperber
- Re: Mixing characters and bytes
  - From: Per Bothner
- Re: Mixing characters and bytes
  - From: Michael Sperber

Prev by Date: Re: Specification vs. Implementation
Next by Date: Re: more on finalization issue, and reference implementation
Previous by thread: Re: Mixing characters and bytes
Next by thread: Re: Mixing characters and bytes
Index(es):
- Date
- Thread