[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Issues with Unicode

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.

From: "Jonathan S. Shapiro" <shap@xxxxxxxxxxx>
Subject: Issues with Unicode
Date: Sun, 23 Apr 2006 10:54:55 +0200

> 11. Strings now, more than ever, are not just vectors of characters
> (though this should be a feasible implementation). There is *excellent*
> discussion of the issues in the libicu documentation, and I strongly
> recommend reading that.

Alternative implementations of strings have been discussed in
this list, and some threads in comp.lang.scheme, I think.
I'd like to draw attention to one point which hasn't been
raised, IIRC.  (Maybe it is too trivial and everybody knows 
about it; if so, sorry for the noise.)

Some of the fancier implementations might not go well with
preemptive multithreads; if mutation of string touches more
than one place of the string objects, it creates a hazard.

Generally it is unacceptable to lock at every string access, so
the practical solution is to split a string structure to a
"header" and a mutable body.  If you want to change the body
of a string in an unsafe way , you allocate a fresh body, 
set it up with desired modifications, and swap the pointer
in the "header" to the new body.   As far as pointer assignment
is atomic, this is safe.

Although this workaround is trivial, I cannot help thinking
how much having string-set! is worth.  This workaround is
almost like we have an immutable string (body), and emulating
mutable strings by the header.  Wouldn't it be more natural to
have strings immutable, and separate object to construct a
string?  For sequential construction we have string ports;
for random construction, it can be a vector, or we could have
vector-of-characters in a spirit of srfi-4, and convert it to
immutable string once we finish building it.

(It might be convenient to have mutable strings for editor-like
applications; which also allow length-changing mutation.  I'd
rather think it to be another type of object that can be built
on top of immutable strings; e.g. a buffer object realized by
a balanced tree of string segments).