This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.
From: "John.Cowan" <jcowan@xxxxxxxxxxxxxxxxx> Subject: Re: the "Unicode Background" section Date: Fri, 22 Jul 2005 17:56:00 -0400 > I'm not saying that any Scheme system has to accept every possible > encoding (though I do think at least ASCII, UTF-8, and UTF-16 should > be mandatory; they are all trivial), but it needs to be possible > to specify the encoding of a port when it is created. (I don't think > it's necessary to be able to change it on the fly, though.) Changing encodings in a port may come handy in a couple of very practical situation: - Parsing RFC2822 and/or MIME messages (the header is ASCII, and the content's charset is specified in the header) - Parsing documents that have encoding specification near the beginning of it (e.g. <?xml version="1.0" encoding="utf-8"?>, or the "coding: utf-8" magic comment to specify source-code encoding). Both can be handled by layering ports, i.e. first you can use an ascii port on top of binary port to find necessary info, then create a new port with desired encoding on top of the original binary port to suck the content. You need to be careful about buffering, though. And some may dislike the overhead of layering. But that's out of scope of the discussion. > Absolutely. Or more specifically: attempt to write a character that's > not in the repertoire associated with the encoding is an error. > Allowing this to be lax is just asking for trouble. I mentioned some other options in my reply to Tom Lord, but there's one practical example: Suppose I have a dynamic website which can store Unicode document. My cgi script uses a CES-conversion port in its output so that it can send out the document in CES specified by the web browser. When one iso8859-1 browser ask a content which has chinese characters, it won't be very useful if the cgi script sends an error page. Usually replacing unmappable characters for '?' or something would be better. (Again, it can be done by smart error handlers that does user-friendly thing when 'encoding not supported' error. It is much more handy if port can handle it, though). --shiro