[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Naming conventions, mismatch indices, CHAR/CHAR-SET/PRED,...

This page is part of the web mail archives of SRFI 13 from before July 7th, 2015. The new archives for SRFI 13 contain all messages, not just those from before July 7th, 2015.



This library is the best one I have seen so far
(thanks, Olin!), but I still have some comments 
and suggestions. 

;;; rationale:

First, I only partially agree with the rationale.
Yes, RNRS's set of string operations is poor, but 
there's nothing wrong with naming conventions (the 
choice between string= and string=? is mainly a matter 
of taste; the former is compatible with = (=? variants
existed in R2RS to make everybody happy), but the 
latter is compatible with char=? and other predicates,
and it is both familiar and standard).

Shared substrings are practically unknown to
the Scheme community and, even if they are
implemented somewhere, I see no reason to
make incompatible changes to the semantics
of SUBSTRING (adding SUBSTRING/SHARED is
definitely a better idea, but why put normal
and /SHARED versions in one SRFI?).

I also oppose "dropping" of standard Scheme
procedures. Does this mean that to claim 
support for SRFI-13 (or SRFI-1) my implementation 
should make them unavailable? If not, then why
mention it at all? I believe the whole idea belongs 
to a separate "Scheme Request For Non-implementation"
process (SRFN-0: drop TRANSCRIPT-ON and TRANSCRIPT-OFF).

I believe that <> convention for "not equal" is not
the best choice: it doesn't look right when applied
to domains with no natural order (i.e. MY-RECORD<>).
Other possible choices are ~=, /=, ^=, and != (all
of them are subpar, but if forced to make a choice,
I would pick ~=).


;;; my votes on the Issues:

Add STRING-REDUCE & STRING-REDUCE-RIGHT:  No
STRING-APPEND accepts chars:              No
N-ary comparison functions:               No 
Add STRING-TOKENIZE:                      No


;;; CHAR/CHAR-SET/PRED parameters:

In my opinion, this is an example of ad-hoc genericity:
the choice of variants is more or less arbitrary (why
STRING or CHAR-LIST are missing? How can I specify 
-ci search for a char?) The whole idea does not fail 
only because strings cannot contain char-sets or 
procedures (this trick doesn't work with lists or
vectors). I agree that something should be done 
to stop the namespace pollution, but there are other
ways: regular higher-order procedures. Besides,
the CHAR/CHAR-SET/PRED approach is another slippery slope: 
why don't we just define generic sequence procedures?


;;; Mismatch index in string-comparison procedures:

Why do we need this mismatch index in each and every comparison
procedure? The fact that it can be returned doesn't necessarily
mean that it should; if somebody's procedure ends in (string= s1 s2)
and I have to rewrite it, say, using lists instead of strings,
I don't want to scan all the code looking for call sites to
check if the return value is just used as a boolean (imitating
mismatch index in my new code may be problematic). I will
have all this imaginary trouble because INTENTIONS of the
original author were not clear. This, however, didn't stop
the designers of MEM*, ASS* and dozens of other functions,
but in many cases I can see some advantages in returning
non-#t value. But, judging from my modest experience, I beleive
that in regular lexicographical string comparison the
mismatch index is NEVER needed. It is definitely less 
useful in practice than n-ary string comparison predicates;
and the unfortunate fact is that STRING{>...} returning a
mismatch index cannot be generalized to n-ary case!


;;; small stuff:

SUBSTRING-COMPARE{-CI}: is mismatch index (!?) relative or absolute?
STRING-ITER: why not STRING-ITERATE?

{SUB}STRING-{PRE,SUF}FIX-COUNT{-CI}: why not -LENGTH- instead of -COUNT-? 
In both CommonLisp and SRFI-1, COUNT is associated with selective counting,
not measuring the length of contiguous subsequences.

SUBSTRING{-CI}? : why do these names end in question mark? They
behave more like SUBSTRING-INDEX{-CI} and other MEMQ-like procedures.


;;; proposed additions (actually these were defined in R^2RS):

(substring-move-left! string1 start1 end1 string2 start2)
(substring-move-right! string1 start1 end1 string2 start2)

String1 and string2 must be a strings, and start1, start2 and
end1 must be exact integers satisfying

 0 <= start1 <= end1 <= (string-length string1)
 0 <= start2 <= end1-start1+start2 <= (string-length string2).

Substring-move-left! and substring-move-right! store characters of
string1 beginning with index start1 (inclusive) and ending with
index end1 (exclusive) into string2 beginning with index start2
(inclusive).

Substring-move-left! stores characters in time order of increasing
indices.  Substring-move-right! stores characters in time order of
decreasing indices.


-- Sergei