This page is part of the web mail archives of SRFI 103 from before July 7th, 2015. The new archives for SRFI 103 contain all messages, not just those from before July 7th, 2015.
John Cowan wrote: > I think it is sensible to require encoding of all the characters that > Windows doesn't allow in path components, viz. #\<, #\>, #\:, #\|, > #\?, #\*, and #x0; through #x1F;. Posix allows all but #\x0;, but > the remainder, though technically permitted, are nothing but nuisances > in pathname components, as they must be escaped when referred to from > the shell. (#\: is an exception, but doesn't show up in Posix filenames > often either.) > > In addition, the Windows executive treats #\\ and #\/ both as path > separators, a fact occasionally convenient, although the UI disallows #\/. > So I'd escape both of them in all cases too. I have been conflicted about this issue as I was drafting this SRFI. One part of me wants to say: I'm disinclined to make SRFI 103 require encoding any characters except the four it uses specially which must be. However, as the document says, an implementation may encode any additional characters it wants. Always encoding the characters which Windows disallows, or which are nuisances in shells, may very well be the de facto for the near future. However, in the farther future, these characters may not need encoding, and other OSs and shells may have greater prevalence than Windows, POSIX, and Bash. Even if the Windows-disallowed and shell-nuisance characters were required to be encoded, there could still exist characters which some file systems need encoded but others do not, e.g. file systems of OSs other than Windows or POSIX, and so communicating what characters to encode and coordinating transcoding path names would still be required. Another part of me thinks: It's not a big deal. Not requiring encoding other cultures' languages' characters, and not requiring encoding other non-natural-language character-symbols which I want to explore using in library names, and promoting progress to file systems which can handle all characters, *is* a big deal to me. But this small set of Windows-disallowed and shell-nuisance characters probably won't be common in library names and can be sacrificed if it really helps portability. And we can always make a new SRFI in the future which revises this one to get rid of requiring encoding these characters. The question is: what is the exact set of characters which should be required to be encoded? I've heard different descriptions of what Windows/DOS disallows. Does it differ between versions? What eras of Microsoft OSs do we want to cater to? Surely, some shells differ in what are nuisance characters? What shells should be catered to for the nuisance characters to encode? -- : Derick ----------------------------------------------------------------