[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode surrogates




On Mon, 13 Mar 2006, Per Bothner wrote:

>bear wrote:
>> To put it another way, Windows allows characters that are not part
>> of Unicode to be used to name files.  If we restrict our character
>> set for filenames to Unicode-only, we will not be able to open
>> those files.  That problem is real.
>
>That does not mean it's a problem we need to solve.
>
>If you make it use to create filenames containing unpaired surrogates,
>that just means you make it easy to files with garbage filenames.

At first I disagreed with the idea that it wasn't a problem that
we needed to solve, but as I think about it you're right...  If
a particular implementation wants to be useful for systems work
on Windows, it needs to solve this problem.  But the standard need
not do so.  For the standard it's entirely reasonable to solve
only the problem of opening and using files that have valid unicode
filenames and leave methods of working with other files unspecified.

>I don't see that as a feature.  Any such filenames are presumably
>unintentional and due to bugs.

Indeed, they are not.  They are being used intentionally and on
purpose, the same way 8-bit extended characters were used in
conjunction with comm programs that supported only seven-bit ascii,
back in 1983 or thereabouts; to provide a final layer of "security
by obscurity."  When I operated a bulletin board system way back
when, I remember having the format utility (and a few others)
renamed to something with characters that people couldn't type
over the serial drivers I had installed.  The current situation on
Windows is similar in that the system protections from the current
user are generally inadequate and a "cheap trick" like this can
stop hostile scripts from running.

				Bear