[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Issues with Unicode

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.




Shiro Kawai wrote:
> I don't know how much R6RS committee want to change string API,
> but I really wish R6RS strings be immutable.

The R6RS members have an ambitious agenda already, so do not get
your hopes too high that STRING will change in a profound way,
or even at all, for R6RS.

Nevertheless, I also would like to see native Scheme strings
eventually to become constants, like integers. With the rise
of Unicode, where a character is a lot more than up to 7 holes
in a punch card, the purpose of the primitive facility Scheme
calls STRING is really vaporizing quickly---I fail to see the
benefit of STRING over VECTOR of CHAR. At the same time, the
need for an advanced and convenient string processing facility
becomes more pressing for all sorts of 'modern' applications,
e.g. web programming.

Jorgen wrote:
> Another quite important aspect of immutable strings is that one
> can use ropes as the internal representation, effectively giving
> high-performance STRING-APPEND and SUBSTRING operations. From my
> experience (and I guess others'), those are much more common than
> string mutation anyways.

One could start with a SRFI for immutable strings as an add-on
library---in the hope that the Scheme community will pick it up
as the primary string type in the distant future.

About a year ago I started working on this in some fringe time,
but did not complete it to the point where I would want to go
public with it. One of the things I learned, however, was that
there are other exciting advanced data structures. E.g.

        http://portal.acm.org/citation.cfm?id=324139

Describes a deque (double-ended queue) with catenation with
all operations O(1) worst-case. I wasn't aware that this is
even possible at all. But before you get too excited: The
constants hidden in the O(-) are substantial, and a general
purpose substring operation must be added (although the obvious
implementation is already O(length of substring), which the
primitive strings have as well.)

People might argue that the primitive strings of R5RS are much
more efficient than any advanced string type, e.g. ropes. This
is true, of course, but at the same time it is not: by the time
your application does what you want using primitive strings it
will probably be less efficient for all the cut corners than a
well designed general library with advanced string operations.

Sebastian