[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Fundamental operations on strings
The current draft says:
Like in R5RS, a Scheme string is a fixed-length array
of Scheme characters.
What R5RS says is "Strings are sequences of characters." Although
having make-string, string-ref and string-set! as primitives
suggests R5RS authors had an array model in their mind, a string
can be implemented differently.
If strings are meant to manipulate texts, as stated in this srfi,
such alternative implementation can even be better than the simple
"fixed-length mutable array of characters" view.
In text, a character's meaning is strongly associated with other
characters that surrounds it, but not so much with its absolute
position. Common operations on text are searching, substring
extraction, substring replacment and concatenation. String-ref
does provide a basic building block for them, but I don't see how
useful string-set! can be, except when you stick to fixed-length
mutable array model, with allocating a fresh string by make-string
and fill it with string-set!.
Restricting string implementation to this mutable array model
bothers me in a couple of points.
- Except the "allocate and fill-in" operation, a fixed-length
mutation provided by string-set! has little practical value.
In applications, length-changing subustring replacement
is far common, and replacing a single character is just a
special case of it.
- If we restrict our idea of string to an array of characters,
string type becomes merely a special type of a vector.
Then, why do we need to have a whole bunch of string
operations, apart from vector operations, in the language
we want to keep concise? What justifies to have a distinct
string type is that its nature is so different from vectors.
If possible, I'd like to see strings become either:
a) A mutable sequence of characters with length-changing
mutation operator (e.g. string-replace!) as a core procedure.
The current string-set! is just a special case of
string-replace!, and becomes a library procedure.
b) An immutable sequence of characters. make-string becomes
rather useless, so the fundamental string constructor should
be list->string and vector->string (though, in practice, string
ports would be used far often). This allows very efficient
substring extraction (which is, far more common than mutation).
Mutable "text" library can be built on top of this fundamental
string library, which will include length-changing mutation.
I personally see immutable strings cleaner. It breaks backward
compatibility, yes, but do you let compatibility drag you
to leave warts in design? Besides, if you concern about
existing code, count the use of string-set!, less the cases
where it is used for sequential string construction, which is
better be replaced for string ports.
Even if people think these changes are too radical and want to
keep compatibility, I'd still like R6RS doesn't push "fixed-length
mutable array model" too much, allowing implementators to use
their imaginations to come up with better representations.