Title

Linear adjustable-length strings

Author

John Cowan <cowan@ccil.org>

Please note that much of the text and many of the examples are derived from SRFI 118 by Per Bothner.

Status

This SRFI is currently in final status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-185@nospamsrfi.schemers.org. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.

Abstract

Scheme specifies mutable fixed-length strings. SRFI 118 adds two procedures, string-append! and string-replace!, which allow the length of the string to change. This SRFI provides two linear-update versions of these procedures: that is, the implementation may change the string length or return a new string instead. In addition, two convenience macros are provided that make the procedures somewhat easier to use.

Rationale

Historically Scheme has only supported fixed-length strings. Schemes that support SRFI 118 or SRFI 140 provide fixed-length immutable strings and variable-length mutable strings, as the usefulness of standard Scheme's fixed-length mutable strings is limited: the main use is for a buffer to which one incrementally adds more data. So why not fold that functionality into the string API? That is, why can't you add just data to the end of a string?

The difficulty is that only a few Scheme implementations provide such strings, and therefore variable-length strings are inherently not portable. This SRFI provides a useful compromise that takes advantage of implementation-specific variable-length strings where they exist, and simulates them where they do not. Note that a native implementation may well copy the underlying characters, and therefore is not necessarily more efficient.

The procedures and the macros of this SRFI together provide an approximation of the two SRFI 118 procedures. Direct calls to string-append! and string-replace! are compatible between the two SRFIs provided that the first argument is a valid place (see below). But calls made using apply, or where the first argument is not a place, are incompatible. In those cases, it is necessary to use the two procedures while being sure to appropriately capture the result.

In this SRFI the terms linear and linear-update are used in the sense of SRFI 1. That is, they are allowed — but not required — to side-effect their first argument in order to construct their results. They use the same ! suffix as purely mutational procedures, because they must be treated by the user as mutational whether they actually mutate anything or not. In other words, any references to dst other than the result of these procedures must be treated as "dead" and never referred to again. This must be so, since you can't rely on the value passed to a linear-update procedure after that procedure has been called.

Specification

Storage model

Note: This section does not apply to implementations that do not support variable-length strings.

In terms of the R[567]RS storage model for strings as described in Section 3.4 of both R5RS and R7RS, strings are modeled not as ordinary fixed-length strings, but as a mutable box (see SRFI 111 for a discussion of boxes) that contains a fixed-length immutable string. Thus a variable-length string, rather than having one location per character as a fixed-length string does, or no locations at all as an fixed-length immutable string (for example, a string literal) does, has just a single location. This does not mean that this is the only correct way to implement variable-length strings, but it clarifies reasoning about them.

In particular, a fixed-length string whose length is zero is in effect immutable, as it contains no locations. But a variable-length string whose current length is zero is modeled as a box containing an empty immutable string rather than simply a fixed-length string. Thus it is possible to extend it with additional characters by changing the location in the box.

Procedures

This SRFI provides two new procedures. The most frequently used is likely to be string-append-linear!, while string-replace-linear! handles the general case of replacement of a substring with another string of possibly different length.

(string-append-linear! dst string-or-char ...)

This procedure returns a string which extends dst by appending each additional string-or-char (in order) to the end of dst. A character argument and a string argument of length 1 are treated exactly the same way. The result can either be dst itself or a newly allocated string.

There is no requirement that this procedure execute in constant time, even amortised (i.e. average) constant time.

(string-replace-linear! dst dst-start dst-end src [src-start [src-end]])
Returns a string which has the same characters as dst, except that the characters between dst-start and dst-end have been replaced with the characters of the string src between src-start and src-end. The result can either be dst itself or a newly allocated string.

The number of characters from src may be different than the number replaced in dst, so the result may be larger or smaller than the previous length of dst. The special case where dst-start is equal to dst-end corresponds to insertion; the case where src-start is equal to src-end corresponds to deletion. The order in which characters are copied is unspecified, except that if the source and destination overlap, copying takes place as if the source is first copied into a temporary string and then into the destination. (This can be achieved without allocating storage by making sure to copy in the correct direction in such circumstances.)

When src is a string then (string-append-linear! dst src) is equivalent to (string-replace-linear! dst (string-length dst) (string-length dst) src).

Note: SRFI 13 and its successors have a non-destructive version, string-replace, which takes the arguments in the following order:

  (string-replace dst src dst-start dst-end [src-start [src-end]])
We adopt a different argument order here to be consistent with string-copy!, and with existing string-replace! implementations.

Macros

This SRFI also provides two new convenience macros. The two macros are more or less equivalent to the procedures, except that the first argument must be a place. A place is either a variable or a procedure application that is defined by identifier-syntax or SRFI 17 so that it can appear as the first argument of set!. An implementation of these macros is free to evaluate the place either once or twice, so it is an error for the place expression to have either side-effects or non-trivial computation.

(string-append! place string-or-char ...)

This macro sets place to the result of invoking (string-append-linear! place string-or-char ...). It returns an unspecified value.

(string-replace! dst-place dst-start dst-end src [src-start [src-end]])
This macro sets dst-place to the result of applying string-replace-linear! to its arguments. The result is an unspecified value.

Example

The following example shows how to process a string using string-for-each and incrementally building a result string using the string-append! macro:

(define (translate-space-to-newline str)
  (let ((result (make-string 0)))
    (string-for-each
     (lambda (ch)
       (cond ((char=? ch #\space)
              (string-append! result #\newline))
             ((char=? ch #\return)) ; Ignore
             (else
              (string-append! result ch))))
     str)
    result))

Usage note: Compare with using string ports:

(define (translate-space-to-newline str)
  (let ((out (open-output-string)))
    (string-for-each
     (lambda (ch)
       (cond ((char=? ch #\space)
              (write-char #\newline out))
             ((char=? ch #\return)) ; Ignore
             (else
              (write-char ch out))))
     str)
    (get-output-string out)))

Using a string port in this situation is probably preferable: It is more portable, and you can expect decent performance in most implementations. Using string-append! may be slightly more efficient on some implementations, due to lower overhead, but that depends on the strategy used by string-append! when the allocated buffer is too small. The string-append-linear! and string-append! macro are most useful when using (reading) a string is interleaved with growing it, or when also using string-replace-linear! or string-replace!.

Implementation

Here is a portable implementation that always returns a new string. It should not be used on systems that provide native variable-length strings. It depends on having SRFI 13, SRFI 130, or SRFI 152. Alternatively, implementers can copy the definition of string-replace out of any of the sample implementations of those SRFIs.

(define (string-append-linear! . args)
  (apply string-append
    (map (lambda (x) (if (char? x) (string x) x)) args)))
  
(define string-replace-linear!
  (case-lambda
    ((dst dst-start dst-end src)
     (string-replace dst src dst-start dst-end 0 (string-length src)))
    ((dst dst-start dst-end src src-start)
     (string-replace dst src dst-start dst-end src-start (string-length src)))
    ((dst dst-start dst-end src src-start src-end)
     (string-replace dst src dst-start dst-end src-start src-end))))

If SRFI 118 or SRFI 140 is available, the implementation is even more trivial:

(define string-append-linear! srfi-118:string-append!)

(define string-replace-linear! srfi-118:string-replace!)

Finally, here are the macros:

(define-syntax string-append!
  (syntax-rules ()
    ((_ place . args)
     (set! place (string-append-linear! place . args)))))

(define-syntax string-replace!
  (syntax-rules ()
    ((_ place . args)
     (set! place (string-replace-linear! place . args)))))

A native implementation is easy, assuming there is some indirection between the string object header and the actual stored characters. (This is not true in all implementations, Chicken for example.) Whenever the character buffer is full, it needs to be replaced with a bigger buffer. If the length of the new buffer is a fixed multiple of the length of the old buffer then string-append-linear! has amortized constant execution time. (The multiple 1.5 seems a good choice.)

If you have cheap Smalltalk-style becomes: (which is admittedly unlikely these days) then you can use that when you need to grow a string beyond its allocated length. Alternatively, a garbage collection can be invoked, which is what Chicken does when it must expand the bytes used to represent a UTF-8 string as a result of replacing a character with another character that requires more bytes.

Copyright

Copyright © Per Bothner 2015, John Cowan 2019

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


Author: John Cowan
Editor: Arthur A. Gleckler