[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Unicode surrogates
> The 2005/07/21 draft disallows surrogate code points, namely those between
> #xD800 and #xDFFF inclusive. In Microsoft Windows NT 4.0 and later, the
> file system and registry use UTF-16LE for encoding names. They allow bare
> surrogate code points.
The SRFI is describing the *internal* character representation, which
is defined in terms of Unicode Scalar Values. Surrogates are a
side-effect of a particular encoding scheme. It would be the
responsibility of the implementation to generate the appropriate
UTF-16LE encoding for a filename that uses characters outside of the
BMP. This is the same issue for operating systems that use UTF-8 as
the file-system encoding.
> For example, I can create a file called "\uD802.ss" in Windows. How
> would I be able to open this file in Scheme with the given proposal?
Well, U+D802 is invalid, since it must be paired.
Tom Emerson Basis Technology Corp.
Software Architect http://www.basistech.com
"You can't fake quality any more than you can fake a good meal." (W.S.B.)