Title

Immutable Strings

Author

Per Bothner

Status

This SRFI is currently in draft status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-140@nospamsrfi.schemers.org. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.

Abstract

This attempts to solve the same issues with R7RS strings raised by SRFI-135, but with better integration with the Scheme language.

We propose to retain the name string as the type of sequences of Unicode characters (scalar values). There are two standard subtypes of string:

An implementation may support other kinds of strings. For example on the Java platform it may be reasonable to consider any instance of java.lang.CharSequence to be a string.

The main part of the proposal specifies the default bindings of various procedure names, as might be pre-defined in a REPL. Specifically, some procedures that traditionally return mutable strings are changed to return istrings. We later discuss compatibility and other library issues.

This combines SRFI-13, SRFI-135, and SRFI-118.

Issues

Rationale

This attempts to solve the same issues with R7RS strings raised by SRFI-135: non-guaranteed O(1) indexing, limited sharing, in general the limited usefulness of mutable strings, especially fixed-size ones. Our solution is in concept the same, but the goal is better integration with the Scheme language. Specifically, SRFI-135 proposes new names for types and procedures in a way that is not integrated with Scheme tradition and existing code. It also adds a slew of new names for people to learn. While the relationship between old procedures and SRFI-135 procedures is natural and straightforward, it still adds a conceptual hurdle and wart to the Scheme language, complicating teaching and migration.

Specification

Note: This specification is intentionally an approximate clone of SRFI-135. However, what SRFI-135 calls textual, we call plain string. What SRFI-135 calls text, we call either istring (in type specifiers) or plain string (in most procedure names). Specifically, all of the SRFI-135 procedures are renamed to start with string- rather than textual- or text-. Any argument or result type marked as textual is changed to string, while the type text is changed to istring.

For example:

string-map proc string1 string2 ... → istring

One exception: make-string returns a mutable string, not an istring.

Notation

In the following procedure specifications:

The rest as in SRFI-135.

String literals

String literals have type istring. They have the same syntax as in R7RS (small). In an implementation that supports SRFI-109 string quasi-literals also evaluate to istrings.

Predicates

(string? obj) → boolean
Is obj a string? Must return true if istring? returns true. Must execute in O(1) time.
(istring? obj) → boolean
Is obj an immutable string, with guaranteed O(1) performance for string-ref and string-length? Must execute in O(1) time.
(string-null? string)boolean
Is string the empty string? Must execute in O(1) time.
(string-every pred string [start end]) → value
(string-any pred string [start end]) → value

Checks to see if every/any character in string satisfies pred, proceeding from left (index start) to right (index end). These procedures are short-circuiting: if pred returns false, string-every does not call pred on subsequent characters; if pred returns true, string-any does not call pred on subsequent characters. Both procedures are "witness-generating":

Note: The names of these procedures do not end with a question mark. This indicates a general value is returned instead of a simple boolean (#t or #f).

Conversions

(string->vector string [start end]) → char-vector
(string->list string [start end]) → char-list
string->vector, and string->list return a newly allocated (unless empty) vector, or list of the characters that make up the given substring.
(vector->string char-vector [start end]) → istring
(list->string char-list [start end]) → istring
These procedures return an istring containing the characters of the given subvector or sublist. The behavior of the result will not be affected by subsequent mutation of the vector or list.
(reverse-list->string char-list) → istring
An efficient implementation of (compose list->string reverse):
(reverse-list->string '(#\a #\B #\c)) → "cBa"
This is a common idiom in the epilogue of string-processing loops that accumulate their result using a list in reverse order. (See also string-concatenate-reverse for the "chunked" variant.)
(string->utf8    string [start end]) → bytevector
(string->utf16   string [start end]) → bytevector
(string->utf16be string [start end]) → bytevector
(string->utf16le string [start end]) → bytevector
These procedures return a newly allocated (unless empty) bytevector containing a UTF-8 or UTF-16 encoding of the given substring.

The bytevectors returned by string->utf8, string->utf16be, and string->utf16le do not contain a byte-order mark (BOM). string->utf16be returns a big-endian encoding, while string->utf16le returns a little-endian encoding.

The bytevectors returned by string->utf16 begin with a BOM that declares an implementation-dependent endianness, and the bytevector elements following that BOM encode the given substring using that endianness.

Rationale: These procedures are consistent with the Unicode standard. Unicode suggests UTF-16 should default to big-endian, but Microsoft prefers little-endian.

(utf8->string    bytevector [start end]) → istring
(utf16->string   bytevector [start end]) → istring
(utf16be->string bytevector [start end]) → istring
(utf16le->string bytevector [start end]) → istring
These procedures interpret their bytevector argument as a UTF-8 or UTF-16 encoding of a sequence of characters, and return a string containing that sequence.

The bytevector subrange given to utf16->string may begin with a byte order mark (BOM); if so, that BOM determines whether the rest of the subrange is to be interpreted as big-endian or little-endian; in either case, the BOM will not become a character in the returned string. If the subrange does not begin with a BOM, it is decoded using the same implementation-dependent endianness used by string->utf16.

The utf16be->string and utf16le->string procedures interpret their inputs as big-endian or little-endian, respectively. If a BOM is present, it is treated as a normal character and will become part of the result.

It is an error if the bytevector subrange given to utf8->string contains invalid UTF-8 byte sequences. For the other three procedures, it is an error if start or end are odd, or if the bytevector subrange contains invalid UTF-16 byte sequences.

Immutable string constructors

(string char) ... → istring
Returns a string consisting of the given characters.
(string-tabulate proc len) → istring
Proc is a procedure that accepts an exact integer as its argument and returns a character. Constructs a string of size len by calling proc on each value from 0 (inclusive) to len (exclusive) to produce the corresponding element of the string. The order in which proc is called on those indexes is not specified.

Rationale: Although string-unfold is more general, string-tabulate is likely to run faster for the common special case it implements.

(string-unfold stop? mapper successor seed [base make-final]) → istring
(string-unfold-right stop? mapper successor seed [base make-final]) → istring
This is a fundamental constructor for strings.

string-unfold is a fairly powerful string constructor. You can use it to convert a list to a string, read a port into a string, reverse a string, copy a string, and so forth. Examples:

(port->string p) = (string-unfold eof-object?
                           values
                           (lambda (x) (read-char p))
                           (read-char p))

(list->string lis) = (string-unfold null? car cdr lis)

(string-tabulate f size) = (string-unfold (lambda (i) (= i size)) f add1 0)

To map f over a list lis, producing a string:

(string-unfold null? (compose f car) cdr lis)

Interested functional programmers may enjoy noting that string-fold-right and string-unfold are in some sense inverses. That is, given operations knull?, kar, kdr, kons, and knil satisfying

(kons (kar x) (kdr x)) = x  and  (knull? knil) = #t

then

(string-fold-right kons knil (string-unfold knull? kar kdr x)) = x
and
(string-unfold knull? kar kdr (string-fold-right kons knil string)) = string.

This combinator pattern is sometimes called an "anamorphism."

Note: Implementations should not allow the size of stings created by string-unfold to be limited by limits on stack size.

For functionality similar to make-string, but returning an immutable string, you can use string-repeat.

Selection

(string-length string) → len
(string-ref string idx) → char
As in R7RS.
(substring string [start end]) → istring
This procedure returns a istring containing the characters of string starting with index start (inclusive) and ending with index end (exclusive).

If string is a mutable string, then that string does not share any storage with the result, so subsequent mutation of that string will not affect the result returned by substring. When the first argument is an istring, implementations are encouraged to return a result that shares storage with that istring, to whatever extent sharing is possible while maintaining some small fixed bound on the ratio of storage used by the shared representation divided by the storage that would be used by an unshared representation. In particular, these procedures should just return their first argument when that argument is a string, start is 0, and end is the length of that string.

For the functionality of substring with guaranteed no sharing use string-replicate for an immutable result, or string-copy for a mutable result.

(string-take       string nchars) → istring
(string-drop       string nchars) → istring
(string-take-right string nchars) → istring
(string-drop-right string nchars) → istring
string-take returns an immutable string containing the first nchars of string; string-drop returns a string containing all but the first nchars of string. string-take-right returns a string containing the last nchars of string; string-drop-right returns a string containing all but the last nchars of string.

The argument string does not share any storage with the result, so subsequent mutation of that string will not affect the string returned by these procedures. If string is an istring, implementations are encouraged to return a result that shares storage with that string (which is easily accomplished by using substring to create the result).

(string-take "Pete Szilagyi" 6) => "Pete S"
(string-drop "Pete Szilagyi" 6) => "zilagyi"

(string-take-right "Beta rules" 5) => "rules"
(string-drop-right "Beta rules" 5) => "Beta "

It is an error to take or drop more characters than are in the string:

(string-take "foo" 37) => error

Replacement

(string-replace string1 string2 start1 end1 [start2 end2]) → istring
Returns
(string-append (substring string1 0 start1)
               (substring string2 start2 end2)
               (substring string1 end1 (string-length string1)))

That is, the segment of characters in string1 from start1 to end1 is replaced by the segment of characters in string2 from start2 to end2. If start1=end1, this simply splices the characters drawn from string2 into string1 at that position.

Examples:

(string-replace "The TCL programmer endured daily ridicule."
                 "another miserable perl drone" 4 7 8 22)
    => "The miserable perl programmer endured daily ridicule."

(string-replace "It's easy to code it up in Scheme." "lots of fun" 5 9)
    => "It's lots of fun to code it up in Scheme."

(define (string-insert s i t) (string-replace s t i i))

(string-insert "It's easy to code it up in Scheme." 5 "really ")
    => "It's really easy to code it up in Scheme."

(define (string-set s i c) (string-replace s (string c) i (+ i 1)))

(string-set "String-ref runs in O(n) time." 19 #\1)
    => "String-ref runs in O(1) time."

Comparison

(string=? string1 string2 string3 ...) → boolean
(string<?  string1 string2 string3 ...) → boolean
(string>?  string1 string2 string3 ...) → boolean
(string<=? string1 string2 string3 ...) → boolean
(string>=? string1 string2 string3 ...) → boolean
(string-ci=? string1 string2 string3 ...) → boolean
(string-ci<?  string1 string2 string3 ...) → boolean
(string-ci>?  string1 string2 string3 ...) → boolean
(string-ci<=? string1 string2 string3 ..). → boolean
(string-ci>=? string1 string2 string3 ...) → boolean
As in R7RS.

Prefixes & suffixes

(string-prefix-length string1 string2 [start1 end1 start2 end2]) → integer
(string-suffix-length string1 string2 [start1 end1 start2 end2]) → integer
Return the length of the longest common prefix/suffix of string1 and string2. For prefixes, this is equivalent to their "mismatch index" (relative to the start indexes).

The optional start/end indexes restrict the comparison to the indicated substrings of string1 and string2.

(string-prefix? string1 string2 [start1 end1 start2 end2]) → boolean
(string-suffix? string1 string2 [start1 end1 start2 end2]) → boolean
Is string1 a prefix/suffix of string2?

The optional start/end indexes restrict the comparison to the indicated substrings of string1 and string2.

Searching

(string-index       string pred [start end]) → idx-or-false
(string-index-right string pred [start end]) → idx-or-false
(string-skip        string pred [start end]) → idx-or-false
(string-skip-right  string pred [start end]) → idx-or-false
string-index searches through the given substring from the left, returning the index of the leftmost character satisfying the predicate pred. string-index-right searches from the right, returning the index of the rightmost character satisfying the predicate pred. If no match is found, these procedures return #f.

Rationale: The SRFI 130 analogues of these procedures return cursors, even when no match is found, and SRFI 130's string-index-right returns the successor of the cursor for the first character that satisfies the predicate. As there are no cursors in this SRFI, it seems best to follow the more intuitive and long-standing precedent set by SRFI 13.

The start and end arguments specify the beginning and end of the search; the valid indexes relevant to the search include start but exclude end. Beware of "fencepost" errors: when searching right-to-left, the first index considered is (- end 1), whereas when searching left-to-right, the first index considered is start. That is, the start/end indexes describe the same half-open interval [start,end) in these procedures that they do in all other procedures specified by this SRFI.

The skip functions are similar, but use the complement of the criterion: they search for the first char that doesn't satisfy pred. To skip over initial whitespace, for example, say

(substring string
            (or (string-skip string char-whitespace?)
                (string-length string))
            (string-length string))

These functions can be trivially composed with string-take and string-drop to produce take-while, drop-while, span, and break procedures without loss of efficiency.

Note: Guile generalizes pred to char_pred, which can be a predicate, a character, or a character set. That seems convenient. Should it be standardized?

(string-contains       string1 string2 [start1 end1 start2 end2]) → idx-or-false
(string-contains-right string1 string2 [start1 end1 start2 end2]) → idx-or-false
Does the substring of string1 specified by start1 and end1 contain the sequence of characters given by the substring of string2 specified by start2 and end2?

Returns #f if there is no match. If start2 = end2, string-contains returns start1 but string-contains-right returns end1. Otherwise returns the index in string1 for the first character of the first/last match; that index lies within the half-open interval [start1,end1), and the match lies entirely within the [start1,end1) range of string1.

(string-contains "eek -- what a geek." "ee" 12 18) ; Searches "a geek"
    => 15

Note: The names of these procedures do not end with a question mark. This indicates a useful value is returned when there is a match.

Case conversion

(string-upcase   string) → istring
(string-downcase string) → istring
(string-foldcase string) → istring
(string-titlecase string) → istring
These procedures return the string obtained by applying Unicode's full uppercasing, lowercasing, case-folding, or title-casing algorithms to their argument. In some cases, the length of the result may be different from the length of the argument. If the result is equal to the argument in the sense of string=?, and the arguments immutable, then that argument may be returned. Note that language-sensitive mappings and foldings are not used.

The results are the same as the R7RS procedures, but as immutable strings.

Concatenation

(string-append string ...) → istring
Returns a string whose sequence of characters is the concatenation of the sequences of characters in the given arguments.
(string-concatenate string-list) → istring
Concatenates the elements of string-list together into a single string.

If any elements of string-list are mutable strings, then those strings do not share any storage with the result, so subsequent mutation of those string will not affect the string returned by this procedure. Implementations are encouraged to return a result that shares storage with some of the strings in the list if that sharing would be space-efficient.

Rationale: Some implementations of Scheme limit the number of arguments that may be passed to an n-ary procedure, so the (apply string-append string-list) idiom, which is otherwise equivalent to using this procedure, is not as portable.

(string-concatenate-reverse string-list [final-string [end]]) → istring
With no optional arguments, calling this procedure is equivalent to
(string-concatenate (reverse string-list))

If the optional argument final-string is specified, it is effectively consed onto the beginning of string-list before performing the list-reverse and string-concatenate operations.

If the optional argument end is given, only the characters up to but not including end in final-string are added to the result, thus producing

(string-concatenate 
  (reverse (cons (substring final-string 0 end)
                 string-list)))
For example:
(string-concatenate-reverse '(" must be" "Hello, I") " going.XXXX" 7)
  => "Hello, I must be going."

Rationale: This procedure is useful when constructing procedures that accumulate character data into lists of string buffers, and wish to convert the accumulated data into a single string when done. The optional end argument accommodates that use case when final-string is a mutable string, and is allowed (for uniformity) when final-string is an immutable string.

(string-join string-list [delimiter [grammar]]) → istring
This procedure is a simple unparser; it pastes strings together using the delimiter string.

string-list is a list of strings. delimiter is a string. The grammar argument is a symbol that determines how the delimiter is used, and defaults to 'infix. It is an error for grammar to be any symbol other than these four:

The delimiter is the string used to delimit elements; it defaults to a single space " ".

(string-join '("foo" "bar" "baz"))
         => "foo bar baz"
(string-join '("foo" "bar" "baz") "")
         => "foobarbaz"
(string-join '("foo" "bar" "baz") ":")
         => "foo:bar:baz"
(string-join '("foo" "bar" "baz") ":" 'suffix)
         => "foo:bar:baz:"

;; Infix grammar is ambiguous wrt empty list vs. empty string:
(string-join '()   ":") => ""
(string-join '("") ":") => ""

;; Suffix and prefix grammars are not:
(string-join '()   ":" 'suffix)) => ""
(string-join '("") ":" 'suffix)) => ":"

Fold & map & friends

(string-fold       kons knil string [start end]) → value
(string-fold-right kons knil string [start end]) → value
These are the fundamental iterators for strings.

The string-fold procedure maps the kons procedure across the given string from left to right:

(... (kons string[2] (kons string[1] (kons string[0] knil))))

In other words, string-fold obeys the (tail) recursion

  (string-fold kons knil string start end)
= (string-fold kons (kons string[start] knil) start+1 end)

The string-fold-right procedure maps kons across the given string or string from right to left:

(kons string[0]
      (... (kons string[end-3]
                 (kons string[end-2]
                       (kons string[end-1]
                             knil)))))

obeying the (tail) recursion

  (string-fold-right kons knil string start end)
= (string-fold-right kons (kons string[end-1] knil) start end-1)

Examples:

;;; Convert a string or string to a list of chars.
(string-fold-right cons '() string)

;;; Count the number of lower-case characters in a string or string.
(string-fold (lambda (c count)
                (if (char-lower-case? c)
                    (+ count 1)
                    count))
              0
              string)

The string-fold-right combinator is sometimes called a "catamorphism."

(string-map proc string1 string2 ...) → istring
As in R7RS, except the result is an immutable string.
(string-for-each proc string1 string2 ...) → unspecified
As in R7RS.
(string-map-index proc string [start end]) → istring
Calls proc on each valid index of the specified substring, converts the results of those calls into strings, and returns the concatenation of those strings. It is an error for proc to return anything other than a character or string. The dynamic order in which proc is called on the indexes is unspecified, as is the dynamic order in which the coercions are performed. If any strings returned by proc are mutated after they have been returned and before the call to string-map-index has returned, then string-map-index returns a string with unspecified contents; the string-map-index procedure itself does not mutate those strings.
(string-for-each-index proc string [start end]) → unspecified
Calls proc on each valid index of the specified substring, in increasing order, discarding the results of those calls. This is simply a safe and correct way to loop over a substring.

Example:

(let ((txt (string->string "abcde"))
      (v '()))
  (string-for-each-index
    (lambda (cur) (set! v (cons (char->integer (string-ref txt cur)) v)))
    txt)
  v) => (101 100 99 98 97)
(string-count string pred [start end]) → integer
Returns a count of the number of characters in the specified substring of string that satisfy the given predicate.
(string-filter pred string [start end]) → istring
(string-remove pred string [start end]) → istring
Filter the given substring of string, retaining only those characters that satisfy / do not satisfy pred.

If string is a mutable string, then that string does not share any storage with the result, so subsequent mutation of that string will not affect the string returned by these procedures. If string is an immutable string, implementations are encouraged to return a result that shares storage with that string whenever sharing would be space-efficient.

Replication & splitting

(string-repeat string-or-character len) → istring
Create a string by repeating the first argument len times. If the first argument is a character, it is as if it were wrapped with the string constructor. We can define string-repeat in terms of the more general string-replicate procedure:
(define (string-repeat S N)
   (let ((T (if (char? S) (string S) S)))
     (string-replicate T 0 (* N (string-length T))))
(string-replicate string [from to [start end]]) → istring
This is an extended substring procedure that implements replicated copying of a substring.

string is a string; start and end are optional arguments that specify a substring of string, defaulting to 0 and the length of string. This substring is conceptually replicated both up and down the index space, in both the positive and negative directions. For example, if string is "abcdefg", start is 3, and end is6, then we have the conceptual bidirectionally-infinite string

    ...  d  e  f  d  e  f  d  e  f  d  e  f  d  e  f  d  e  f  d ...
        -9 -8 -7 -6 -5 -4 -3 -2 -1  0 +1 +2 +3 +4 +5 +6 +7 +8 +9

string-replicate returns the substring of this string beginning at index from, and ending at to. It is an error if from is greater than to.

If from and to are missing they default to 0 and (string-length string), respectively. This variant is a generalization of using substring, but unlike substring never shares substructures that would retain characters or sequences of characters that are substructures of its first argument or previously allocated objects. (Hence it is equivalent to SRFI-135's string-copy.)

You can use string-replicate to perform a variety of tasks:

Note that

It is an error if start=end, unless from=to, which is allowed as a special case.

(string-split string delimiter [grammar limit start end]) → list
Returns a list of strings representing the words contained in the substring of string from start (inclusive) to end (exclusive). The delimiter is a string string to be used as the word separator. This will often be a single character, but multiple characters are allowed for use cases such as splitting on "\r\n". The returned list will have one more item than the number of non-overlapping occurrences of the delimiter in the string. If delimiter is an empty string, then the returned list contains a list of strings, each of which contains a single character.

The grammar is a symbol with the same meaning as in the string-join procedure. If it is infix, which is the default, processing is done as described above, except an empty string produces the empty list; if grammar is strict-infix, then an empty string signals an error. The values prefix and suffix cause a leading/trailing empty string in the result to be suppressed.

If limit is a non-negative exact integer, at most that many splits occur, and the remainder of string is returned as the final element of the list (so the result will have at most limit+1 elements). If limit is not specified or is #f, then as many splits as possible are made. It is an error if limit is any other value.

To split on a regular expression re, you can use SRFI 115's regexp-split procedure.

Mutable string constructors

(make-string [k [char]]) → mstring
Return a new allocated mutable string of length k, where k defaults to 0. If char is given, then all the characters of the string are initialized to char, otherwise the contents of the string are unspecified. The 1-argument version is deprecated as poor style, except when k is 0.

To return an immutable string that repeats k times a character char use string-repeat.

This is as R7RS, except the result is variable-size and we allow leaving out k when it is zero.

Rationale: The most common pattern for mutable strings (at least in a language like Java) to allocate an empty string and incrementally append to it. It seems natural to initialize the string with (make-string), rather than (make-string 0).

(string-copy string [start [end]]) → mstring
As in R7RS.

Procedures for mutating a string

(string-set! mstring k char) → unspecified
(string-fill! mstring fill [start [end]]) → unspecified
(string-copy! to at from [start [end]]) → unspecified
As in R7RS.
(string-append! mstring value ...) → unspecified
(string-replace! mstring dst dst-start dst-end src [src-start [src-end]] → unspecified
As in SRFI-118.

Other string-returning procedures

The R7RS procedures symbol->string, number->string, and get-output-string all return istring.

R7RS specifies that it is an error to mutate the results of command-line, get-environment-variable, or get-environment-variables. These return istrings.

Libraries

There is no reason - except backward compatibility - for a procedure such as string-upcase to return a mutable string. However, we need a mechanism to use the "old" immutable-string-returning string-upcase rather than the istring-returning version.

The following libraries are defined:

(stri 140 base)
Same as R7RS's (scheme base) except that string procedure are as in this specification.
(stri 140 char)
Same as R7RS's (scheme char) except that string procedures are as in this specification.
(srfi 140 mstrings)
Provides definition for all the procedures that this specification specifies to return an istring, but which in R7RS or SRFI-13 return mutable strings, such as string-append. TODO: List needed. This library provides versions that return mutable strings.

It could be implemented like this:

(define-library (srfi 140 mstrings)
  (import (rename (srfi 140 istrings) (string-upcase istring-upcase)))
  (export string-upcase etc ...)
  (begin
    (define (string-upcase str)
      (string-copy (istring-upcase str)))
    etc ...))
(srfi 140 istrings)
This library exports the same names as (string 140 mstrings), but the functions return an istring.
(srfi 135)
(srfi 135 texts)
The SRFI-135 library, if provided, is naturally implemented by importing (string 140 istrings) and other libraries, and then using export with rename.

Default environment

An implementation should base its default environment on the bindings of (stri 140 istrings). However, if there is a command-line switch or other way to select "R7RS standards mode", then that switch should cause it to use the (stri 140 mstrings) bindings.

Implementation

Unfinished!

This is basically just SRFI-135 with different names. So we can re-use any suitable SRFI-135 implementation. However, here is a simple and fairly efficient implementation of istrings written for the Java platform, which uses UTF-16 chars:

public class SchemeIString implements CharSequence {
    String str;
    int cplength; // number of codepoints

    /* Index of every 16-th character.
     * Null if all characters are in the basic plane. */
    int[] offsets;

    /** used for string-ref */
    public int indexByCodePoints(int i) {
        if (offsets == null)
            return (int) str.charAt(i);
        int pos = offsets[i>>4];
        // Optimize if all characters between (i & 15)
        // and (i & 15) + 16 are all in the basic plane:
        if (pos + 16 == offsets[(i>>4) + 1])
            return (int) str.charAt(pos + (i & 15));
        // scan linearly from pos, at most 15 characters forward:
        return str.codePoinAt(pos + str.offsetByCodePoints(pos, i&15));
    }

    /* used for string-length */
    public int lengthByCodePoints() { return cplength; }

    /** To implement CharSequence */
    public char charAt(int i) { return str.charAt(i); }
    public String toString() { return str; }
    public int length() { return str.length(); }
}
This implementation doesn't support shared substrings, but that could be handled with a little more work and replacing the String str field with a char array field char[] chars. Sharing would be handled by re-using the chars and offsets fields, but adding start/end offsets. Of course that has the tradeoff of more garbage being retained.

Implementing mutable strings is trivial, since there is no performance expectation. Just use a simple record which has a pointer to a data buffer, which can be reallocated as needed.

Acknowledgements

Thank you to Olin Shivers, author of SRFI-13 String Libraries; and William D Clinger, author of SRFI-135 Immutable Texts.

Copyright

Copyright (C) Per Bothner 2016

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


Author: Per Bothner