[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
getenv vs. locale
SRFI 98 reads:
For obtaining the value of the environment value, getenv may use
locale-setting information to encode the name, and decode the value
of the environment variable.
And it doesn't seem appropriate.
Historically, environment variables are used to specify
filenames (either sole, or lists, as in the case of PATH.)
However, filenames aren't actually ``strings'' on a number of
the general purpose platforms currently in use, most notably on
those of the POSIX flavour.
Indeed, a filename is more like a NUL-terminated byte vector on
such platforms. It's perfectly valid to have a filename like
this (sans the NUL):
#u8(1 2 3 ... 46 48 ... 255)
However, this byte vector has no interpretation as an UTF-8
string. Therefore, it would be an error for Scheme
implementation to ``use locale-setting information to [...]
decode the value of the environment variable'', when the
environment variable's value is like that, and the locale
settings specify the use of the UTF-8 encoding.
On the other hand, the implementations which are capable of
passing the byte vector obtained from the process' environment
directly to open () (without trying to interpret it as a string)
would be able to work with the file irrespective of the locale
settings currently in effect.
Actually, I'm not aware of any platforms which specify the
encoding used for either the names or values of the environment
variables. Therefore, I'd rather be suggesting for `getenv' to
be standardised as returning SRFI-4 byte vector. However, this
``view'' on the OS interface doesn't seem to be widely accepted.
Indeed, Scheme48 is the only implementation I know that takes an
attempt to solve this somewhat peculiar problem.