Title

Simple adjustable-size strings

Author

Per Bothner <per@bothner.com>

This SRFI is currently in ``draft'' status. To see an explanation of each status that a SRFI can hold, see here. To provide input on this SRFI, please mail to <srfi minus 118 at srfi dot schemers dot org>. See instructions here to subscribe to the list. You can access previous messages via the archive of the mailing list.

Abstract

Scheme specifies mutable fixed-length strings. We add two procedures string-append! and string-replace! which allow the size of the string to change. We also require that the standard Scheme procedures make-string and string-copy return variable-size strings.

Rationale

Historically Scheme has only supported fixed-length strings. Conceptually, strings can be immutable (read-only) or mutable (writable). Unfortunately, the usefulness of a fixed-size mutable string is extremely minimal: The main use is for a buffer to which one incrementally adds more data. So why not fold that functionality into the string API? I.e. why can't you add just data to the end of a string?

The implementation difficulty with variable-size strings is you need some kind of indirection, so you can re-allocate the data when you run out of space. However, many implementation of string already use such indirection for various reasons, such as memory management.

It is worth noting that some string representations use UTF-8 or UTF-16 encoding, for reasons of compactness or compatibility with other APIs. The obvious disadvantage is that string-ref is no longer a constant-time operation, but there are ways to work around that; string-ref isn't really an important operation for typical string manipulation. Note that in such an implementation a mutable string is inherently variable-size. For example, if you replace a 1-byte character with a 2-byte character then the stored size changes. So you might as well expose this functionality to the Scheme programmer.

Common Lisp supports this functionality with an optional fill-pointer, combined with using adjust-array when the allocated string fills up.

One option is to specify 3 kinds of strings: immutable strings, mutable fixed-size strings, and mutable variable-size strings. I've argued that the use-case for fixed-size mutable strings is so limited (and maybe better supported by character uniform arrays) that mutable strings should by default be variable-size. Specifically, standard Scheme functions that return mutable strings should return variable-size mutable strings. An implementation may provide a mechanism to explicitly create a fixed-size mutable string (or a character uniform vector); however, this specification does not propose or recommend that.

It has been suggested that a more general buffer API might be better than tweaking the old-fashioned (and potentially inefficient) integer-index-oriented string APIs. This proposal is much more modest - it just adds two new procedures. Thus it should be much easier to learn, and easier to modify Scheme programs to use it.

Specification

The standard Scheme functions make-string and string-copy are specified to return variable-size mutable strings by default.

The following standard Scheme functions may return fixed-sized or variable-sized strings: string, string-upcase, string-downcase, string-foldcase, substring, string-append, and list->string. (These are functions that should have been specified to return immutable strings if we didn't have to worry about compatibility.)

We add 2 new functions. The most frequently used is likely to be string-append!, while string-replace! handles the general case of in-place replacement of a substring with another string of possibly different size.

(string-append! string value ...)

The string must be a variable-size mutable string. The string-append! procedure extends string by appending each value (in order) to the end of string. A value can be a character or a string.

The following example shows how to process a string using string-for-each and incrementally building a result string using string-append!:

(define (translate-space-to-newline str)
  (let ((result (make-string 0)))
    (string-for-each
     (lambda (ch)
       (cond ((char=? ch #\Space)
              (string-append! result #\Newline))
             ((char=? ch #\Return)) ; Ignore
             (else
              (string-append! result ch))))
     str)
    result))

There is no requirement that this procedure execute in constant time, even amortised (i.e. average) constant time.

Usage note: Compare with using string ports:

(define (translate-space-to-newline str)
  (let ((out (open-output-string)))
    (string-for-each
     (lambda (ch)
       (cond ((char=? ch #\Space)
              (write-char #\Newline out))
             ((char=? ch #\Return)) ; Ignore
             (else
              (write-char ch out))))
     str)
    (get-output-string out)))

Using a string port in this situation is probably preferable: It is more portable, and you can expect decent performance in most implementations. Using string-append! may be slighly more efficient on some implementations, due to lower overhead, but that depends on the strategy used by string-append! when the allocated buffer is too small. The string-append! function is most useful when using (reading) a string is interleaved with growing it, or when also using string-replace!.

(string-replace! dst dst-start dst-end src [src-start [src-end]])
Replaces the characters of the variable-size string dst (between dst-start and dst-end) with the characters of the string src (between src-start and src-end). The number of characters from src may be different than the number replaced in dst, so the string may grow or contract. The special case where dst-start is equal to dst-end corresponds to insertion; the case where src-start is equal to src-end corresponds to deletion. The order in which characters are copied is unspecified, except that if the source and destination overlap, copying takes place as if the source is first copied into a temporary string and then into the destination. (This can be achieved without allocating storage by making sure to copy in the correct direction in such circumstances.)

When value is a string then (string-append! dst value) is equivalent to (string-replace! dst (string-length dst) (string-length dst) value).

Implementation

The implementation is trivial, assuming there is some indirection between the string object header and the actual stored characters. Whenever the character buffer is full, it needs to be replaced with a bigger buffer. If the size of the new buffer is a fixed multiple of the size of the old buffer then string-append! has amortized constant execution time. (The multiple 1.5 seems a good choice.)

If you have cheap Smalltalk-style becomes: (which is admittedly unlikely these days) then you can use that when you need to grow a string beyond its allocated size.

Kawa 2.0 implements the functionality of this SRFI.

Copyright

Copyright (C) Per Bothner 2015

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


Author: Per Bothner
Editor: Michael Sperber