[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Encoding Windows reserved charactes

This page is part of the web mail archives of SRFI 103 from before July 7th, 2015. The new archives for SRFI 103 contain all messages, not just those from before July 7th, 2015.



John Cowan <cowan@xxxxxxxx> writes:

> Derick Eddington scripsit:
>
>
>> The question is: what is the exact set of characters which should be
>> required to be encoded?  I've heard different descriptions of what
>> Windows/DOS disallows.  Does it differ between versions?  What eras of
>> Microsoft OSs do we want to cater to?
>
> Microsoft's page[1] and Cygwin's[2] agree perfectly; the first certainly
> should know, and the second has had every reason to find out.  I cannot
> believe that Microsoft, with its obsession with backward compat, would
> ever remove a character from the blacklist (which might break ancient apps
> that don't expect to see them) nor add one (which would make existing
> files unreachable).  So I think the blacklist of #\", #\*, #\:, #\<,
> #\>, #\?, #\|, #\/, #\\, and #\x0; to #\x1F; is a solid one.
>
> The blacklist doubtless arose because COMMAND.COM (and its ancestors, the
> CP/M monitor and various DEC command executives) didn't have any escape
> convention, and so files with those characters couldn't be manipulated
> from the shell.  Consequently, the kernel forbade them, and it still does.
>
> Note that this limitation is specific to Windows, the operating system,
> not any particular file system.  In fact, the Microsoft page specifically
> says that there may be more characters which are forbidden by the file
> system.  But I don't think either VFAT or NTFS applies any restrictions
> of its own -- indeed, the Posix subsystem (which bypasses the Windows
> executive and runs directly on the NT kernel) does not respect the
> blacklist, and can create files which Windows programs cannot process.
>
Additionally, and more annoyingly IMO, Windows disallows several
perfectly innocent-looking names like "aux", "prn", "con" and "nul" (at
least), with any extension (see also [0] for a story including some
historical background). I wonder if SRFI 103 should mention this
horrendous stupidity. I actually ran into this, naming a library
"aux.sls", and a fellow Schemer on Windows was unable to check out the
git archive containing this file, getting obscure error messages.

[0] http://heirloom.sourceforge.net/mailx_aux_c.html

>> Surely, some shells differ in what are nuisance characters?  What shells
>> should be catered to for the nuisance characters to encode?
>
> I wouldn't worry about that.  The fact that these characters are
> painful on Posix systems because of the shell is just lagniappe.
>
+1. Zsh handles completion of such filenames just fine, FWIW:

rotty@delenn:~/tmp% touch 'foo*'
rotty@delenn:~/tmp% ls f<TAB>
rotty@delenn:~/tmp% ls foo\* 

Regards, Rotty
-- 
Andreas Rottmann -- <http://rotty.yi.org/>