[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

getenv vs. locale

This page is part of the web mail archives of SRFI 98 from before July 7th, 2015. The new archives for SRFI 98 contain all messages, not just those from before July 7th, 2015.



	SRFI 98 reads:

--cut--
    For obtaining the value of the environment value, getenv may use
    locale-setting information to encode the name, and decode the value
    of the environment variable.
--cut--

	And it doesn't seem appropriate.

	Historically, environment variables are used to specify
	filenames (either sole, or lists, as in the case of PATH.)
	However, filenames aren't actually ``strings'' on a number of
	the general purpose platforms currently in use, most notably on
	those of the POSIX flavour.

	Indeed, a filename is more like a NUL-terminated byte vector on
	such platforms.  It's perfectly valid to have a filename like
	this (sans the NUL):

#u8(1 2 3 ... 46 48 ... 255)

	However, this byte vector has no interpretation as an UTF-8
	string.  Therefore, it would be an error for Scheme
	implementation to ``use locale-setting information to [...]
	decode the value of the environment variable'', when the
	environment variable's value is like that, and the locale
	settings specify the use of the UTF-8 encoding.

	On the other hand, the implementations which are capable of
	passing the byte vector obtained from the process' environment
	directly to open () (without trying to interpret it as a string)
	would be able to work with the file irrespective of the locale
	settings currently in effect.

	Actually, I'm not aware of any platforms which specify the
	encoding used for either the names or values of the environment
	variables.  Therefore, I'd rather be suggesting for `getenv' to
	be standardised as returning SRFI-4 byte vector.  However, this
	``view'' on the OS interface doesn't seem to be widely accepted.
	Indeed, Scheme48 is the only implementation I know that takes an
	attempt to solve this somewhat peculiar problem.