[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

A suggestion: string-split, string->integer



Hello!

	If I may I'd like to propose two more functions,
string->integer  and string-split.

	string-split is similar to string-tokenize, but it supports a
'delimiting' rather than the inclusion grammar. While the token-set
tells characters that make up tokens, a string-split's argument
specifies a set of characters to delimit tokens with. Some problems
are more elegantly and efficiently expressed in terms of inclusion,
some other are in terms of delimiting. I found for example that in
Perl and Python split() is a rather often-used function. Furthermore,
string-split ought to accept an optional LIMIT argument to limit the
number of splits performed. The specification in Appendix A below is
what I implemented, and wrote an extensive set of tests for. You will
probably generalize the CHARSET argument (SRFI-14 didn't exists when
App A was written). Besides, I defer to you as to how to mold
string-split into SRFI-13 should this proposal gets accepted.


	R5RS procedure string->number is far more generic than the
proposed string->integer -- and this may be a problem IMHO.  For
example, string->number will try to read strings like "1/2" "1S2"
"1.34" and even "1/0" (the latter causing a zero-divide error). Note
that to Gambit's string->number, "1S2" is a valid representation of an
_inexact_ integer (100 to be precise).  Oftentimes we want to be more
restrictive about what we consider a number; we want merely to read an
integral label.

	-- procedure+: string->integer STR START END

Makes sure a substring of the STR from START (inclusive) till END
(exclusive) is a representation of a non-negative integer in decimal
notation. If so, this integer is returned. Otherwise -- when the
substring contains non-decimal characters, or when the range from
START till END is not within STR, the result is #f.


> [SRFI-13]
> string-concatenate string-list -> string
>     Append the elements of STRING-LIST together into a single _list_.
>     Guaranteed to return a freshly allocated _list_.

Did you mean to say a 'string' (instead of a _list_)?


SRFI-13 mentions that string-unfold is also called "anamorphism".
Do you want to point out that a foldr combinator (e.g.,
string-fold-right) is also called a "catamorphism"?


	Thank you for the trouble you took putting together SRFI-13!

	Oleg


Appendix A. string-split (a very draft proposal)

 
-- procedure+: string-split STRING
-- procedure+: string-split STRING '()
-- procedure+: string-split STRING '() MAXSPLIT

Returns a list of whitespace delimited words in STRING.  If STRING is
empty or contains only whitespace, then the empty list is
returned. Leading and trailing whitespaces are trimmed.  If MAXSPLIT
is specified and positive, the resulting list will contain at most
MAXSPLIT elements, the last of which is the string remaining after
(MAXSPLIT - 1) splits. If MAXSPLIT is specified and non-positive, the
empty list is returned. "In time critical applications it behooves you
not to split into more fields than you really need."

-- procedure+: string-split STRING CHARSET
-- procedure+: string-split STRING CHARSET MAXSPLIT

Returns a list of words delimited by the characters in CHARSET in
STRING. CHARSET is a list of characters that are treated as
delimiters.

Leading or trailing delimiters are NOT trimmed. That is, the resulting
list will have as many initial empty string elements as there are
leading delimiters in STRING.

If MAXSPLIT is specified and positive, the resulting list will contain
at most MAXSPLIT elements, the last of which is the string remaining
after (MAXSPLIT - 1) splits. If MAXSPLIT is specified and
non-positive, the empty list is returned. "In time critical
applications it behooves you not to split into more fields than you
really need."

This is based on the split function in Python/Perl

(string-split " abc d e f  ") ==> ("abc" "d" "e" "f")
(string-split " abc d e f  " '() 1) ==> ("abc d e f  ")
(string-split " abc d e f  " '() 0) ==> ()
(string-split ":abc:d:e::f:" '(#\:)) ==> ("" "abc" "d" "e" "" "f" "")
(string-split ":" '(#\:)) ==> ("" "")
(string-split "root:x:0:0:Lord" '(#\:) 2) ==> ("root" "x:0:0:Lord")
(string-split "/usr/local/bin:/usr/bin:/usr/ucb/bin" '(#\:))
 ==> ("/usr/local/bin" "/usr/bin" "/usr/ucb/bin")
(string-split "/usr/local/bin" '(#\/)) ==> ("" "usr" "local" "bin")

Implementation:
	http://pobox.com/~oleg/ftp/Scheme/util.scm
A regression test suite:
	http://pobox.com/~oleg/ftp/Scheme/vinput-parse.scm