[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Encoding Windows reserved charactes

This page is part of the web mail archives of SRFI 103 from before July 7th, 2015. The new archives for SRFI 103 contain all messages, not just those from before July 7th, 2015.



Derick Eddington scripsit:


> The question is: what is the exact set of characters which should be
> required to be encoded?  I've heard different descriptions of what
> Windows/DOS disallows.  Does it differ between versions?  What eras of
> Microsoft OSs do we want to cater to?

Microsoft's page[1] and Cygwin's[2] agree perfectly; the first certainly
should know, and the second has had every reason to find out.  I cannot
believe that Microsoft, with its obsession with backward compat, would
ever remove a character from the blacklist (which might break ancient apps
that don't expect to see them) nor add one (which would make existing
files unreachable).  So I think the blacklist of #\", #\*, #\:, #\<,
#\>, #\?, #\|, #\/, #\\, and #\x0; to #\x1F; is a solid one.

The blacklist doubtless arose because COMMAND.COM (and its ancestors, the
CP/M monitor and various DEC command executives) didn't have any escape
convention, and so files with those characters couldn't be manipulated
from the shell.  Consequently, the kernel forbade them, and it still does.

Note that this limitation is specific to Windows, the operating system,
not any particular file system.  In fact, the Microsoft page specifically
says that there may be more characters which are forbidden by the file
system.  But I don't think either VFAT or NTFS applies any restrictions
of its own -- indeed, the Posix subsystem (which bypasses the Windows
executive and runs directly on the NT kernel) does not respect the
blacklist, and can create files which Windows programs cannot process.

> Surely, some shells differ in what are nuisance characters?  What shells
> should be catered to for the nuisance characters to encode?

I wouldn't worry about that.  The fact that these characters are
painful on Posix systems because of the shell is just lagniappe.
The important issue is that Windows programs *cannot* create or refer
to files containing characters on the blacklist.

[1] http://msdn.microsoft.com/en-us/library/aa365247(VS.85).aspx
[2] http://www.cygwin.com/1.7/cygwin-ug-net/using-specialnames.html

-- 
You're a brave man! Go and break through the            John Cowan
lines, and remember while you're out there              cowan@xxxxxxxx
risking life and limb through shot and shell,           http://ccil.org/~cowan
we'll be in here thinking what a sucker you are!
        --Rufus T. Firefly