[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode surrogates



rgburger@xxxxxxxxxxx writes:
> The 2005/07/21 draft disallows surrogate code points, namely those between 
> #xD800 and #xDFFF inclusive.  In Microsoft Windows NT 4.0 and later, the 
> file system and registry use UTF-16LE for encoding names.  They allow bare 
> surrogate code points.

The SRFI is describing the *internal* character representation, which
is defined in terms of Unicode Scalar Values. Surrogates are a
side-effect of a particular encoding scheme. It would be the
responsibility of the implementation to generate the appropriate
UTF-16LE encoding for a filename that uses characters outside of the
BMP. This is the same issue for operating systems that use UTF-8 as
the file-system encoding.

> For example, I can create a file called "\uD802.ss" in Windows.  How
> would I be able to open this file in Scheme with the given proposal?

Well, U+D802 is invalid, since it must be paired.

    -tree

-- 
Tom Emerson                                          Basis Technology Corp.
Software Architect                                 http://www.basistech.com
 "You can't fake quality any more than you can fake a good meal." (W.S.B.)