This SRFI is currently in final status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to
email@example.com. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.
Scheme has an impoverished set of string-processing utilities, which is a problem for authors of portable code. This SRFI proposes a coherent and comprehensive set of string-processing procedures. It is a reduced version of SRFI 13 that has been aligned with SRFI 135, Immutable Texts. Unlike SRFI 13, it has been made consistent with the R5RS, R6RS, and R7RS-small string procedures.
Here is a list of the procedures provided by this SRFI:
This SRFI is based upon SRFI 130, copying much of its structure and wording, but eliminating the concept of cursors. However, it is textually derived from SRFI 135, in order to gain access to the editorial improvements made to the text of that SRFI, which was itself based on SRFI 130. Ultimately the origin of all these SRFIs is SRFI 13.
This SRFI omits the following bells, whistles, and gongs of SRFI 13:
string-splitprocedures from other sources. For completeness,
string-drop-while-rightare also provided.
There are no performance guarantees for any of the procedures in this SRFI.
The Scheme programming language does not expose the internal
representation of strings.
Some implementations of Scheme use UTF-32 or a similar encoding,
string-set! run in O(1) time.
Some implementations use UTF-16 or UTF-8, which save space
at the expense of making
time proportional to the length of a string.
Others allow only 256 characters, typically the Latin-1 repertoire.
Although Scheme's string data type allows portable code to use strings independently of their internal representation, the variation in performance between implementations has created a problem for programs that use long strings. In some systems, long strings are inefficient with respect to space; in other systems, long strings are inefficient with respect to time. Consequently, this SRFI suggests that Scheme's mutable strings be used only for relatively short sequences of characters, while using the immutable texts defined by SRFI 135 for long sequences of characters.
Procedures present in R5RS, R6RS, and R7RS-small are marked (R5RS). Procedures present in R5RS and R6RS but with additional arguments in R7RS-small are marked (R5RS+). Procedures present in R6RS and R7RS-small are marked (R6-R7RS). Procedures present in R6RS only are marked (R6RS). Procedures present in R7RS-small only are marked (R7RS-small).
Except as noted, the results returned from the procedures of this SRFI must be newly allocated strings. This is a change from the definition of SRFIs 13 and 130, though most Schemes do not support sharable strings in any case. However, the empty string need not be newly allocated.
The procedures of this SRFI follow
a consistent naming scheme, and are consistent with the conventions
developed in SRFI 1 and used in SRFI 13, SRFI 130, and SRFI 135.
procedures that have left/right directional variants
use no suffix to specify left-to-right operation,
-right to specify
right-to-left operation, and
-both to specify both.
One discrepancy between SRFI 1 and other SRFIs is in the tabulate
procedure: SRFI 1's
list-tabulate takes the length argument
first, before the procedure, whereas all string SRFIs put the procedure
first, in line with mapping and folding operations.
The order of common arguments is consistent across the different procedures. In particular, all procedures place the main string to be operated on first, with the exception of the mapping and folding procedures, which are consistent with R7RS-small and SRFI 1.
If a procedure's return value is said to be "unspecified," the procedure returns a single result whose value is unconstrained and might even vary from call to call.
In the following procedure specifications:
(string-length string); the sample implementations detect that error and raise an exception.
string-any, all predicates passed to procedures specified in this SRFI may be called in any order and any number of times. It is an error if pred has side effects or does not behave functionally (returning the same result whenever it is called with the same character); the sample implementation does not detect those errors.
It is an error to pass values that violate the specification above.
Arguments given in square brackets are optional. Unless otherwise noted in the string describing the procedure, any prefix of these optional arguments may be supplied, from zero arguments to the full list. When a procedure returns multiple values, this is shown by listing the return values in square brackets as well. So, for example, the procedure with signature
halts? f [x init-store] → [boolean integer]
would take one (f), two (f, x) or three (f, x, init-store) input arguments, and return two values, a boolean and an integer.
An argument followed by "
..." means zero or more elements.
So the procedure with the signature
sum-squares x ... → numbertakes zero or more arguments (x ...), while the procedure with signature
spell-check doc dict1 dict2 ... → string-list
takes two required arguments (doc and dict1) and zero or more optional arguments (dict2 ...).
string?obj → boolean (R5RS)
string-null?string → boolean
string-everypred string [start end] → value
string-anypred string [start end] → value
Checks to see if every/any character in string
proceeding from left (index start)
to right (index end).
These procedures are short-circuiting:
if pred returns false,
does not call pred on subsequent characters;
if pred returns true,
does not call pred on subsequent characters;
Both procedures are "witness-generating":
string-everyis given an empty interval (with start = end), it returns
string-everyreturns true for a non-empty interval (with start < end), the returned true value is the one returned by the final call to the predicate on
(string-ref (string-copy string) (- end 1)).
string-anyreturns true, the returned true value is the one returned by the predicate.
The names of these procedures do not end with a question mark.
This indicates a general value is returned instead of a simple boolean
make-stringlen char → string (R5RS)
stringchar ... → string (R5RS)
string-tabulateproc len → string
string-unfold is more general,
string-tabulate is likely to run faster
for the common special case it implements.
string-unfoldstop? mapper successor seed [base make-final] → string
"". It is an error if base is anything other than a character or string.
(lambda (x) ""). It is an error for make-final to return anything other than a character or string.
string-unfold is a fairly powerful string constructor.
You can use it to
convert a list to a string, read a port into a string, reverse a string,
copy a string, and so forth. Examples:
(port->string p) = (string-unfold eof-object? values (lambda (x) (read-char p)) (read-char p)) (list->string lis) = (string-unfold null? car cdr lis) (string-tabulate f size) = (string-unfold (lambda (i) (= i size)) f add1 0)
To map f over a list lis, producing a string:
(string-unfold null? (compose f car) cdr lis)
Interested functional programmers may enjoy noting that
string-unfold are in some sense inverses.
That is, given operations
knull?, kar, kdr, and kons,
and a value knil satisfying
(kons (kar x) (kdr x)) = x and (knull? knil) = #t
(string-fold-right kons knil (string-unfold knull? kar kdr x)) = xand
(string-unfold knull? kar kdr (string-fold-right kons knil string)) = string.
This combinator pattern is sometimes called an "anamorphism."
Note: Implementations should not allow the size of strings created
string-unfold to be limited by limits on stack size.
string-unfold-rightstop? mapper successor seed [base make-final] → string
string-unfoldexcept the results of mapper are assembled into the string in right-to-left order, base is the optional rightmost portion of the constructed string, and make-final produces the leftmost portion of the constructed string. If mapper returns a string, the string is prepended to the constructed string (without reversal).
(string-unfold-right (lambda (n) (< n (char->integer #\A))) (lambda (n) (char-downcase (integer->char n))) (lambda (n) (- n 1)) (char->integer #\Z) #\space (lambda (n) " The English alphabet: ")) => " The English alphabet: abcdefghijklmnopqrstuvwxyz " (string-unfold-right null? (lambda (x) (string #\[ (car x) #\])) cdr '(#\a #\b #\c)) => "[c][b][a]"
string->vectorstring [start end] → char-vector (R7RS-small)
string->liststring [start end] → char-list (R5RS+)
vector->stringchar-vector [start end] → string (R7RS-small)
list->stringchar-list → string (R5RS)
reverse-list->stringchar-list → string
(compose list->string reverse):
(reverse-list->string '(#\a #\B #\c)) → "cBa"This is a common idiom in the epilogue of string-processing loops that accumulate their result using a list in reverse order. (See also
string-concatenate-reversefor the "chunked" variant.)
string-lengthstring → len (R5RS)
string-refstring idx → char (R5RS)
substringstring start end → string (R5RS)
string-copystring [start end] → string (R5RS+)
substringrequires all three arguments, whereas
string-copyrequires only one.
string-takestring nchars → string
string-dropstring nchars → string
string-take-rightstring nchars → string
string-drop-rightstring nchars → string
string-takereturns a string containing the first nchars of string;
string-dropreturns a string containing all but the first nchars of string.
string-take-rightreturns a string containing the last nchars of string;
string-drop-rightreturns a string containing all but the last nchars of string.
(string-take "Pete Szilagyi" 6) => "Pete S" (string-drop "Pete Szilagyi" 6) => "zilagyi" (string-take-right "Beta rules" 5) => "rules" (string-drop-right "Beta rules" 5) => "Beta "
It is an error to take or drop more characters than are in the string:
(string-take "foo" 37) => error
string-padstring len [char start end] → string
string-pad-rightstring len [char start end] → string
(string-pad "325" 5) => " 325" (string-pad "71325" 5) => "71325" (string-pad "8871325" 5) => "71325"
string-trimstring [pred start end] → string
string-trim-rightstring [pred start end] → string
string-trim-bothstring [pred start end] → string
(string-trim-both " The outlook wasn't brilliant, \n\r") => "The outlook wasn't brilliant,"
string-replacestring1 string2 start1 end1 [start2 end2] → string
(string-append (substring string1 0 start1) (substring string2 start2 end2) (substring string1 end1 (string-length string1)))
That is, the segment of characters in string1 from start1 to end1 is replaced by the segment of characters in string2 from start2 to end2. If start1=end1, this simply splices the characters drawn from string2 into string1 at that position.
(string-replace "The TCL programmer endured daily ridicule." "another miserable perl drone" 4 7 8 22) => "The miserable perl programmer endured daily ridicule." (string-replace "It's easy to code it up in Scheme." "lots of fun" 5 9) => "It's lots of fun to code it up in Scheme." (define (string-insert s i t) (string-replace s t i i)) (string-insert "It's easy to code it up in Scheme." 5 "really ") => "It's really easy to code it up in Scheme." (define (string-set s i c) (string-replace s (string c) i (+ i 1))) (string-set "String-ref runs in O(n) time." 21 #\1) => "String-ref runs in O(1) time."
string=?string1 string2 string3 ... → boolean (R5RS)
#tif all the strings have the same length and contain exactly the same characters in the same positions; otherwise returns
string<?string1 string2 string3 ... → boolean (R5RS)
string>?string1 string2 string3 ... → boolean (R5RS)
string<=?string1 string2 string3 ... → boolean (R5RS)
string>=?string1 string2 string3 ... → boolean (R5RS)
#tif their arguments are (respectively): monotonically increasing, monotonically decreasing, monotonically non-decreasing, or monotonically non-increasing.
These comparison predicates are required to be transitive.
These procedures compare strings in an implementation-defined way.
One approach is to make them the lexicographic extensions to strings
of the corresponding orderings on characters. In that case,
string<? would be the lexicographic ordering on
strings induced by the ordering
char<? on characters,
and if two strings differ in length but are the same up to the length
of the shorter string, the shorter string would be considered to be
lexicographically less than the longer string.
However, implementations are also allowed to use more sophisticated
In all cases, a pair of strings must satisfy exactly one of
string<=? if and only if
they do not satisfy
string>=? if and only if
they do not satisfy
string-ci=?string1 string2 string3 ... → boolean (R5RS)
#tif, after calling
string-foldcaseon each of the arguments, all of the case-folded strings would have the same length and contain the same characters in the same positions; otherwise returns
string-ci<?string1 string2 string3 ... → boolean (R5RS)
string-ci>?string1 string2 string3 ... → boolean (R5RS)
string-ci<=?string1 string2 string3 ... → boolean (R5RS)
string-ci>=?string1 string2 string3 ... → boolean (R5RS)
string-foldcaseon their arguments before applying the corresponding procedures without "
string-prefix-lengthstring1 string2 [start1 end1 start2 end2] → integer
string-suffix-lengthstring1 string2 [start1 end1 start2 end2] → integer
The optional start/end indexes restrict the comparison to the indicated substrings of string1 and string2.
string-prefix?string1 string2 [start1 end1 start2 end2] → boolean
string-suffix?string1 string2 [start1 end1 start2 end2] → boolean
The optional start/end indexes restrict the comparison to the indicated substrings of string1 and string2.
string-indexstring pred [start end] → idx-or-false
string-index-rightstring pred [start end] → idx-or-false
string-skipstring pred [start end] → idx-or-false
string-skip-rightstring pred [start end] → idx-or-false
string-indexsearches through the given substring from the left, returning the index of the leftmost character satisfying the predicate pred.
string-index-rightsearches from the right, returning the index of the rightmost character satisfying the predicate pred. If no match is found, these procedures return
The start and end arguments specify the
beginning and end of the search; the valid indexes relevant to
the search include start but exclude end.
Beware of "fencepost" errors: when searching right-to-left,
the first index considered is
(- end 1),
whereas when searching left-to-right, the first index considered is
That is, the start/end indexes describe the same half-open interval
[start,end) in these procedures that they do
in all other procedures specified by this SRFI.
The skip functions are similar, but use the complement of the criterion: they search for the first char that doesn't satisfy pred. To skip over initial whitespace, for example, say
(substring string (or (string-skip string char-whitespace?) (string-length string)) (string-length string))
string-containsstring1 string2 [start1 end1 start2 end2] → idx-or-false
string-contains-rightstring1 string2 [start1 end1 start2 end2] → idx-or-false
#f if there is no match.
If start2 = end2,
string-contains returns start1 but
string-contains-right returns end1.
Otherwise returns the index in string1
for the first character of the first/last match;
that index lies within the half-open interval
and the match lies entirely within the
[start1,end1) range of string1.
(string-contains "eek -- what a geek." "ee" 12 18) ; Searches "a geek" => 15
Note: The names of these procedures do not end with a question mark. This indicates a useful value is returned when there is a match.
string-take-whilestring pred [start end] → string
string-take-while-rightstring pred [start end] → string
string-drop-whilestring pred [start end] → string
string-drop-while-rightstring pred [start end] → string
These are the same as
but with a different order of arguments. (Not SRFI 13 procedures.)
string-spanstring pred [start end] → [string string]
string-breakstring pred [start end] → [string string]
String-spansplits the substring of string specified by start and end into the longest initial prefix whose elements all satisfy pred, and the remaining tail.
String-breakinverts the sense of the predicate: the tail commences with the first element of the input string that satisfies the predicate. (Not SRFI 13 procedures.)
In other words:
span finds the initial span of elements
break breaks the string at the first element satisfying
String-span is equivalent to
(values (string-take-while pred string) (string-drop-while pred string))
string-appendstring ... → string (R5RS)
string-concatenatestring-list → string
Some implementations of Scheme
limit the number of arguments that may be passed to an n-ary procedure,
(apply string-append string-list) idiom,
which is otherwise equivalent to using this procedure, is not as
string-concatenate-reversestring-list [final-string end] → string
(string-concatenate (reverse string-list))
If the optional argument final-string is specified,
it is effectively consed
onto the beginning of string-list
before performing the
If the optional argument end is given, only the characters up to but not including end in final-string are added to the result, thus producing
(string-concatenate (reverse (cons (substring final-string 0 end) string-list)))For example:
(string-concatenate-reverse '(" must be" "Hello, I") " going.XXXX" 7) => "Hello, I must be going."
This procedure is useful when constructing procedures that
accumulate character data into lists of string buffers, and wish to
convert the accumulated data into a single string when done.
The optional end argument accommodates that use case
by allowing the final buffer to be only partially full without
having to copy it a second time, as
Note that reversing a string simply reverses the sequence of code points it contains. Caution should be taken if a grapheme cluster is divided between two string arguments.
string-joinstring-list [delimiter grammar] → string
string-list is a list of strings.
delimiter is a string.
The grammar argument is a symbol that determines
how the delimiter is
used, and defaults to
It is an error for grammar to be any symbol other
than these four:
'infixmeans an infix or separator grammar: insert the delimiter between list elements. An empty list will produce an empty string.
'strict-infixmeans the same as
'infixif the string-list is non-empty, but will signal an error if given an empty list. (This avoids an ambiguity shown in the examples below.)
'suffixmeans a suffix or terminator grammar: insert the delimiter after every list element.
'prefixmeans a prefix grammar: insert the delimiter before every list element.
The delimiter is the string used to delimit elements; it defaults to a single space " ".
(string-join '("foo" "bar" "baz")) => "foo bar baz" (string-join '("foo" "bar" "baz") "") => "foobarbaz" (string-join '("foo" "bar" "baz") ":") => "foo:bar:baz" (string-join '("foo" "bar" "baz") ":" 'suffix) => "foo:bar:baz:" ;; Infix grammar is ambiguous wrt empty list vs. empty string: (string-join '() ":") => "" (string-join '("") ":") => "" ;; Suffix and prefix grammars are not: (string-join '() ":" 'suffix)) => "" (string-join '("") ":" 'suffix)) => ":"
string-foldkons knil string [start end] → value
string-fold-rightkons knil string [start end] → value
string-fold procedure maps the kons procedure
across the given string from left to right:
(... (kons string (kons string (kons string knil))))
In other words,
string-fold obeys the (tail) recursion
(string-fold kons knil string start end) = (string-fold kons (kons string[start] knil) start+1 end)
string-fold-right procedure maps kons across the
given string from right to left:
(kons string (... (kons string[end-3] (kons string[end-2] (kons string[end-1] knil)))))
obeying the (tail) recursion
(string-fold-right kons knil string start end) = (string-fold-right kons (kons string[end-1] knil) start end-1)
;;; Convert a string to a list of chars. (string-fold-right cons '() string) ;;; Count the number of lower-case characters in a string. (string-fold (lambda (c count) (if (char-lower-case? c) (+ count 1) count)) 0 string)
string-fold-right combinator is sometimes called a "catamorphism."
string-mapproc string1 string2 ... → string (R7RS-small)
string-map, does not accept characters as arguments, or returns a value that is not a character or string.
string-map procedure applies proc element-wise
to the characters of the string arguments, converts each value
returned by proc to a string, and returns the concatenation of
If more than one string argument is given and not all have
the same length, then
string-map terminates when the shortest
string argument runs out.
The dynamic order in which proc is called on the characters
of the string arguments is unspecified, as is the dynamic
order in which the coercions are performed. If any strings returned
by proc are mutated after they have been returned and before
the call to
string-map has returned, then
string-map returns a string with unspecified contents; the
string-map procedure itself does not mutate those strings.
(string-map (lambda (c0 c1 c2) (case c0 ((#\1) c1) ((#\2) (string c2)) ((#\-) (string #\- c1)))) "1222-1111-2222" "Hi There!" "Dear John") => "Hear-here!"
string-for-eachproc string1 string2 ... → unspecified (R7RS-small)
string-mapor does not accept characters as arguments.
string-for-each procedure applies proc element-wise
to the characters of the string arguments, going from left
If more than one string argument is given and not all have
the same length, then
string-for-each terminates when the
shortest string argument runs out.
string-countstring pred [start end] → integer
string-filterpred string [start end] → string
string-removepred string [start end] → string
In SRFI 13,
string-remove is called
This is inconsistent with SRFI 1 and other SRFIs.
string-replicatestring from to [start end] → string
string is a string;
start and end are optional arguments that specify
a substring of string,
defaulting to 0 and the length of string.
This substring is conceptually replicated both up and down the index space,
in both the positive and negative directions.
For example, if string is
start is 3,
and end is 6,
then we have the conceptual bidirectionally-infinite string
... d e f d e f d e f d e f d e f d e f d ... -9 -8 -7 -6 -5 -4 -3 -2 -1 0 +1 +2 +3 +4 +5 +6 +7 +8 +9
string-replicate returns the substring of this string
beginning at index from,
and ending at to.
It is an error if from is greater than to.
You can use
string-replicate to perform a variety of tasks:
(string-replicate "abcdef" 2 8)=>
(string-replicate "abcdef" -2 4)=>
(string-replicate "abc" 0 7)=>
It is an error if start=end, unless from=to, which is allowed as a special case.
In SRFI 13, this procedure is called
string-segmentstring k → list
string-splitstring delimiter [grammar limit start end] → list
"\r\n". The returned list will have one more item than the number of non-overlapping occurrences of the delimiter in the string. If delimiter is an empty string, then the returned list contains a list of strings, each of which contains a single character. (Not a SRFI 13 procedure; replaces
The grammar is a symbol with the same meaning as
If it is
infix, which is the default,
processing is done as described above, except
an empty string produces the empty list;
if grammar is
then an empty string signals an error.
cause a leading/trailing empty string in the result to be suppressed.
If limit is a non-negative exact integer, at most that
many splits occur, and the remainder of string
is returned as the final element of the list
(so the result will have at most limit+1 elements).
If limit is not specified or is
as many splits as possible are made.
It is an error if limit is any other value.
To split on a regular expression,
use SRFI 115's
read-stringk [port] → string (R7RS-small)
write-stringstring [port start end]→ unspecified (R7RS-small)
string-set!string k char → unspecified (R5RS)
string-fill!string fill [start end] → unspecified (R5RS+)
string-copy!to at from [start end] → unspecified (R7RS-small)
The sample implementations of this SRFI are in the SRFI repository. The main implementation is portable but inefficient; since efficiency is not a design goal (use texts for that!), it should be satisfactory.
There are two modules for Chicken. One works on Chicken's
native 8-bit strings; the other leverages the
to provide a UTF-8 facade over those same strings. This means that
there is no reliable way to tell by inspection whether a string is
8-bit or UTF-8, and one must take precautions to avoid mixing them.
The Chicken modules
srfi-13 utf8 utf8-srfi-13 utf8-case-map
shouldn't be imported together into the same module or program
as they are inherently incompatible.
However, it is possible to import
utf8-srfi-152 and then
cherry-pick non-conflicting identifiers from
(import (only utf8 read-char write-char print ...)).
There is no problem with the
When importing any of the
scheme chicken data-structures extras
modules along with
be sure to do it as follows to avoid conflicts:
(import (except scheme make-string string string-length string-ref string-set! substring string->list list->string string-fill!)) (import (except chicken reverse-list->string)) (import (except data-structures string-split substring-index)) (import (except extras read-string write-string))
When using the
srfi-152 module instead, import the
module as follows:
(import (except scheme string->list string-fill!))
The other modules, if imported, must be restricted in the same way as shown above.
The R7RS library assumes the presence of all R7RS-small procedures and does not require excluding any of them, as this SRFI is inherently compatible with R7RS-small.
I acknowledge the participants in the SRFI 152 mailing list, and everyone acknowledged in SRFI 135 (which acknowledges everyone acknowledged in SRFI 130 (which acknowledges everyone acknowledged in SRFI 13)). Particularly important are Olin Shivers, the author of SRFI 13, and Will Clinger, the author of SRFI 135.
As Olin said, we should not assume any of those individuals endorse this SRFI.
Copyright (C) John Cowan (2017).
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.