[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode surrogates

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.



rgburger@xxxxxxxxxxx writes:
> The 2005/07/21 draft disallows surrogate code points, namely those between 
> #xD800 and #xDFFF inclusive.  In Microsoft Windows NT 4.0 and later, the 
> file system and registry use UTF-16LE for encoding names.  They allow bare 
> surrogate code points.

The SRFI is describing the *internal* character representation, which
is defined in terms of Unicode Scalar Values. Surrogates are a
side-effect of a particular encoding scheme. It would be the
responsibility of the implementation to generate the appropriate
UTF-16LE encoding for a filename that uses characters outside of the
BMP. This is the same issue for operating systems that use UTF-8 as
the file-system encoding.

> For example, I can create a file called "\uD802.ss" in Windows.  How
> would I be able to open this file in Scheme with the given proposal?

Well, U+D802 is invalid, since it must be paired.

    -tree

-- 
Tom Emerson                                          Basis Technology Corp.
Software Architect                                 http://www.basistech.com
 "You can't fake quality any more than you can fake a good meal." (W.S.B.)