[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: upcoming revision, need feedback

This page is part of the web mail archives of SRFI 103 from before July 7th, 2015. The new archives for SRFI 103 contain all messages, not just those from before July 7th, 2015.

Derick Eddington wrote:
> (I still think the environment variable element separators should be
> defined in the sections about the environment variables, even though
> they'll also be specified in the encoded characters explanation.)

I think so too.

> Ugh.  The Microsoft page [1] about what characters to avoid does not say
> that #\~ is treated specially.  Should #\~ be added to the encoded set?
> [1] http://msdn.microsoft.com/en-us/library/aa365247(VS.85).aspx

It is described in the "Short vs. Long Names" section. Basically every
filename that does not fit into 8.3 naming scheme has an 8.3 alias,
typically in the form of XXXXXX~N.XXX. The exact rules for making that
alias are file-system dependent though [1].

So yes, to avoid conflicts you need to escape ~.

[1] http://blogs.msdn.com/oldnewthing/archive/2004/04/14/113052.aspx

>> Another example is  (U+00A5). When represented in Japanese cp-932 it
>> maps to #x5C (just as \ does in ascii), which is treated as a path
>> separator. Because of this some programs (e.g. Cygwin) will choke on
>> filenames with U+00A5 when cp-932 is your local codepage, even though
>> U+00A5 itself is perfectly legal. This also applies to â (U+20A9) in
>> Korean (cp-949), and possibly more.
> Ugh.  I think that type of problem should be outside this SRFI's
> concern, because it's variable and dependent on individuals' codepage
> configurations, and there is not a proper solution (encoding the
> majority of characters is not acceptable).

I have jumped the gun saying  is completely broken with Cygwin: that
issue was fixed in Cygwin 1.7; it even supports '"', '*', ':', '<', '>'
and '|' in the filenames. This is not to say other software is not
broken in the same way, but non-working Cygwin was a major concern.

>> FWIW, using non-ascii symbols in source files is widely considered bad
>> manners in my culture. So while I do recognize value in not needing to
>> encode these symbols, I won't complain much about the discrimination.
> Well, I think that's an unfortunate consequence of archaic poor
> English-only designs, and your culture should take advantage of modern
> character freedom :)

Well, you see, there is only that much of audience for ÐÐÑÐ ÐÐÑÑÐÑÑÐÐ
Scheme, and while (ÐÐÑÐÐÐ ÐÐÑÑÐ) is an awesome library I'm afraid you
won't enjoy using it.

> People who want to use whatever characters can configure their
> Windows crap to make those characters work in file names, right?


> And when such files are packaged and distributed to another platform,
> the correct file names will be used, right?

Wrong. Every Windows packer I have failed to correctly unpack the
tarball with funny names made by tar from Cygwin. Just the way tar fails
to correctly unpack a similar tarball by a windows packer.

Then of course broken software is nothing new, so the point is moot.
Let's just hope there are no more pitfalls in localized path handling on
any platform.