[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: getenv vs. locale

This page is part of the web mail archives of SRFI 98 from before July 7th, 2015. The new archives for SRFI 98 contain all messages, not just those from before July 7th, 2015.



From: Ivan Shmakov <ivan@xxxxxxxxxxxxx>
Subject: getenv vs. locale
Date: Sat, 19 Jul 2008 00:46:56 +0700

> 	SRFI 98 reads:
> 
> --cut--
>     For obtaining the value of the environment value, getenv may use
>     locale-setting information to encode the name, and decode the value
>     of the environment variable.
> --cut--
> 
> 	And it doesn't seem appropriate.
> 
> 	Historically, environment variables are used to specify
> 	filenames (either sole, or lists, as in the case of PATH.)
> 	However, filenames aren't actually ``strings'' on a number of
> 	the general purpose platforms currently in use, most notably on
> 	those of the POSIX flavour.

Good point.  In principle, we should treat anything fed from
the outside world as a binary vector until it goes through one of
proper "gates" (e.g. ports).

However, I guess almost all the time the caller of getenv would
want to use the result as a string.  It would be inconvenient
to insert conversion routine in every call.
(BTW, R6RS's command-line procedure in (rnrs programs (6)) library
is defined to return a list of strings, despite that the host
operating system may pass a byte sequence that can't be converted
to the implementation's string object.)

I suggest to keep getenv returning a string, with additional
specification when the actual environment value can't be converted.
The resolution can be either:

(a) Raises an exception (natural, but may be inconvenient in some cases)
(b) Returns some value indicating such condition has happened
    (not Schemey).
(c) Allow an optional 'filter' argument.  If it is given, it should
    be a procedure that takes one argument.  It is called with the
    raw byte-vector value, and its return value is the result of 
    get-environment-variable.  The filter procedure is usually
    supposed to return a string, but it may be an identity function
    if the caller wants the raw byte-vector value.  It can raise
    an exception if the raw value can't be converted to a string,
    or can return some special value indicating the situation.

If we take (a) or (b), we could add another procedures that
return the raw value(s), in case the caller wants to process them.

I think (c) is reasonably general, but it raises another issue:
If we aim at r6rs, the filter procedure would take r6rs bytevector.
If we look at r5rs compatibility, srfi-4 u8vector would be a
natural choice.  Which way should we go?

--shiro