Per Bothner
<per@bothner.com>
This SRFI is currently in final status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-118@nospamsrfi.schemers.org
. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.
Scheme specifies mutable fixed-length strings.
We add two procedures string-append!
and
string-replace!
which allow the size of the string to change.
We also require that the standard Scheme procedures make-string
and string-copy
return variable-size strings.
Historically Scheme has only supported fixed-length strings. Conceptually, strings can be immutable (read-only) or mutable (writable). Unfortunately, the usefulness of a fixed-size mutable string is extremely minimal: The main use is for a buffer to which one incrementally adds more data. So why not fold that functionality into the string API? I.e. why can't you add just data to the end of a string?
The implementation difficulty with variable-size strings is you need some kind of indirection, so you can re-allocate the data when you run out of space. However, many implementation of string already use such indirection for various reasons, such as memory management.
It is worth noting that some string representations use UTF-8
or UTF-16 encoding, for reasons of compactness or compatibility with
other APIs. The obvious disadvantage is that string-ref
is no longer a constant-time operation, but there are ways to work
around that; string-ref
isn't really an important operation
for typical string manipulation.
Note that in such an implementation a mutable string is inherently
variable-size. For example, if you replace a 1-byte character with
a 2-byte character then the stored size changes. So you might as well
expose this functionality to the Scheme programmer.
Common Lisp supports this functionality with an optional
fill-pointer, combined with using adjust-array
when
the allocated string fills up.
One option is to specify 3 kinds of strings: immutable strings, mutable fixed-size strings, and mutable variable-size strings. I've argued that the use-case for fixed-size mutable strings is so limited (and maybe better supported by character uniform arrays) that mutable strings should by default be variable-size. Specifically, standard Scheme functions that return mutable strings should return variable-size mutable strings. An implementation may provide a mechanism to explicitly create a fixed-size mutable string (or a character uniform vector); however, this specification does not propose or recommend that.
It has been suggested
that a more general buffer
API might be better than tweaking
the old-fashioned (and potentially inefficient) integer-index-oriented
string APIs. This proposal is much more modest - it just adds two
new procedures. Thus it should be much easier to learn,
and easier to modify Scheme programs to use it.
The standard Scheme functions make-string
and string-copy
are specified to return
variable-size mutable strings by default.
The following standard Scheme functions
may return fixed-sized or variable-sized strings:
string
, string-upcase
, string-downcase
,
string-foldcase
, substring
,
string-append
, and list->string
.
(These are functions that should have been specified to return immutable
strings if we didn't have to worry about compatibility.)
We add 2 new functions. The most frequently used is likely to be
string-append!
, while string-replace!
handles the general case of in-place replacement of a substring
with another string of possibly different size.
(string-append! string value ...)
The string must be a variable-size mutable string. The
string-append!
procedure extends string by appending each value (in order) to the end of string. A value can be a character or a string.The following example shows how to process a string using
string-for-each
and incrementallybuildinga result string usingstring-append!
:(define (translate-space-to-newline str) (let ((result (make-string 0))) (string-for-each (lambda (ch) (cond ((char=? ch #\Space) (string-append! result #\Newline)) ((char=? ch #\Return)) ; Ignore (else (string-append! result ch)))) str) result))There is no requirement that this procedure execute in constant time, even amortised (i.e. average) constant time.
Usage note: Compare with using string ports:
(define (translate-space-to-newline str) (let ((out (open-output-string))) (string-for-each (lambda (ch) (cond ((char=? ch #\Space) (write-char #\Newline out)) ((char=? ch #\Return)) ; Ignore (else (write-char ch out)))) str) (get-output-string out)))Using a string port in this situation is probably preferable: It is more portable, and you can expect decent performance in most implementations. Using
string-append!
may be slighly more efficient on some implementations, due to lower overhead, but that depends on the strategy used bystring-append!
when the allocated buffer is too small. Thestring-append!
function is most useful when using (reading) a string is interleaved with growing it, or when also usingstring-replace!
.
(string-replace! dst dst-start dst-end src [src-start [src-end]])
Replaces the characters of the variable-size string dst (between dst-start and dst-end) with the characters of the string src (between src-start and src-end). The number of characters from src may be different than the number replaced in dst, so the string may grow or contract. The special case where dst-start is equal to dst-end corresponds to insertion; the case where src-start is equal to src-end corresponds to deletion. The order in which characters are copied is unspecified, except that if the source and destination overlap, copying takes place as if the source is first copied into a temporary string and then into the destination. (This can be achieved without allocating storage by making sure to copy in the correct direction in such circumstances.)When value is a string then
(string-append! dst value)
is equivalent to(string-replace! dst (string-length dst) (string-length dst) value)
.Note: Srfi-13 has a nondestructive version,
string-replace
, which takes the arguments in the following order:(string-replace dst src dst-start dst-end [src-start [src-end]])We adopt a different argument order here to be consistent withstring-copy!
, and existingstring-replace!
implementations.
The implementation is trivial, assuming there is some indirection
between the string object header and the actual stored characters.
Whenever the character buffer is full, it needs to be replaced with
a bigger buffer. If the size of the new buffer is a
fixed multiple of the size of the old buffer then
string-append!
has amortized constant execution time.
(The multiple 1.5 seems a good choice.)
If you have cheap Smalltalk-style becomes:
(which is admittedly
unlikely these days) then you can use that when you need to grow a string
beyond its allocated size.
Kawa 2.0 implements the functionality of this SRFI.
Copyright (C) Per Bothner 2015
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.