[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: upcoming revision, need feedback
- To: srfi-103@xxxxxxxxxxxxxxxxx
- Subject: Re: upcoming revision, need feedback
- From: Vitaly Magerya <vmagerya@xxxxxxxxx>
- Date: Mon, 11 Jan 2010 04:57:10 +0200
- Delivered-to: srfi-103@xxxxxxxxxxxxxxxxx
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from :user-agent:mime-version:to:subject:references:in-reply-to :content-type:content-transfer-encoding; bh=m8tyCcSoKYfbTURhIbfsQIyXoEsy8Vxh5ZGQaONeCyk=; b=glispyiOq+6zR5jAdEBBW4P54PHtwzSndjosWZCQqgnC6SanbSeNYbqr54FRrEq6CO jNvqZoJAsioW4h5a7zu2d75855/N/LpG/4qPjHGZP+89rs5V2KYqQKz09ZiuYLVDPNUA 42MKVygkT3x/MvKDPX+ECDsfPcNIspReLZ6v8=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; b=Hw7pXgXhqTdS5VlSIjDJkklOtXghClUkHiNBJv9EnEU8hIDBfxsL1uYGmAuMmTmuf8 BS+10Siwk4CrBktuySqrqVU5xsdFgedVviDJ50qmfOCFH7LHu1739VZF3wojzn29AQRo OVLTtYLoZ/Aykb8uFuZz3DvZFSq9oi+GUuZ/8=
- In-reply-to: <1263167293.15783.61.camel@eep>
- References: <1263094024.31734.295.camel@eep> <4B49FD33.2050304@xxxxxxxxx> <1263167293.15783.61.camel@eep>
- User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.5) Gecko/20091204 Thunderbird/3.0
Derick Eddington wrote:
> I think the pathname component separators do need to be defined.
> [...] if they're undefined, the encoded set would not be clearly,
> precisely, completely specified.
The current draft sets the encoded set to be <a list of chars and the
path separator>. The set of path separators depends on a platform [1],
but the set of encoded characters should not (for portability reasons).
So you must include all the possible separators from all the supported
platforms in the encoded set -- after that specifying each of them
separately serves no purpose.
But in the end this point is of little importance; I will not object
either way.
[1] E.g. Windows uses both forward and back slashes as path separators.
>>> 7) Add #\; to the set of encoded characters, because a directory could be both
>>> in the SCHEME_LIB_PATH sequence and correspond to a library name component.
>>> Such a directory with a name including #\; is unusual but must be supported,
>>> otherwise an unencoded #\; would be misinterpreted in SCHEME_LIB_PATH.
>>
>> I heard that when you strive to fail safety it's best to enumerate
>> allowed things, not the forbidden ones.
>
> I don't think that justifies what you suggest below.
It is generally hard to list all the failure conditions, but easy to
list success conditions.
Let me illustrate: ~ is missing in the encoded set, since Windows
threats that character specially (e.g. "PROGRA~1" is a shortcut to the
first file starting with "Progra").
Another example is  (U+00A5). When represented in Japanese cp-932 it
maps to #x5C (just as \ does in ascii), which is treated as a path
separator. Because of this some programs (e.g. Cygwin) will choke on
filenames with U+00A5 when cp-932 is your local codepage, even though
U+00A5 itself is perfectly legal. This also applies to â (U+20A9) in
Korean (cp-949), and possibly more.
>> How about "Encode everything
>> except for [a-zA-Z0-9_.-]"? It's safe, short, simple and works for 99%
>> of libraries without any encoding at all.
>
> Other cultures' characters must be usable unencoded, especially since
> the targeted file systems support using them, and we want other
> cultures' use of Scheme to not be discriminated against growing to be
> more than 1% of libraries.
FWIW, using non-ascii symbols in source files is widely considered bad
manners in my culture. So while I do recognize value in not needing to
encode these symbols, I won't complain much about the discrimination.
Also note that file system support for localized characters in Windows
is (was?) problematic since it uses local codpeage in many places. Due
to this a filename with a Ukrainian 'Ñ' (U+0456) is not accessible via
an SMB mount from a Windows with Russian settings [2].
[2] Once upon a time this bit a fair share of accountants in Ukraine.