[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode surrogates

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.

Tom Emerson scripsit:

> > For example, I can create a file called "\uD802.ss" in Windows.  How
> > would I be able to open this file in Scheme with the given proposal?
> Well, U+D802 is invalid, since it must be paired.

It is indeed invalid Unicode.  Unfortunately, Win32 filenames are not Unicode
strings; they are vectors of almost-arbitrary 16-bit values (certain values
are prohibited).  Similarly, Posix filenams are not strings either; they
are vectors of almost-arbitrary 8-bit values.

Vectors, though, are not a sensible interface to file systems; filenames are
thought of as strings, accessed as strings, and almost always do correspond
to strings.   The occasional deficiencies in this model just have to be

The man that wanders far                        cowan@xxxxxxxx
from the walking tree                           http://www.ap.org
        --first line of a non-existent poem by:         John Cowan