[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Suggestion: include test suite in SRFI



   Date: Tue, 10 Nov 1998 13:34:25 -0500
   From: William D Clinger <will@ccs.neu.edu>

   Test suites can show that an implementation does not conform
   to some specification, but they can't (usually) show that an
   implementation does conform.  The only things you can
   realistically hope to get out of an executable test procedure
   are the following:

       1.  A true result means the implementation might conform,
	   because it either passed all of the tests or bombed
	   so badly as to return a bogus result.
       2.  A false result means the implementation does not conform,
	   because it failed one or more tests.
       3.  No result at all means the implementation blew up during
	   one of the tests.

It's funny that you put it this way, and I don't even mean to
disagree, but my vague recollection is that for Ada test suites,
passing the test was considered "conforming" and failing the test was
considered "a reason for you to write an explanation of your
non-conformance for consideration by whoever was judging".  I think if
you looked more carefully, you'd find the idea was to avoid the
potential legal headache which most standards organizations are very
savvy about from having been burnt where if you say someone doesn't
conform, you put them in the position of either paying lots to fix it
or of selling something that you have labeled as inadequate.  And in
that case, you're open to lawsuit for having defamed their product
unfairly.  It was recommended to us in the ANSI process that we not
opine about whether someone's implementation fails to conform because
of such legal risks.  Better to let the marketplace sort that out.
I'm not saying the Scheme community has to take the same approach,
but it should take whatever approach it does take mindful of the fact
that the legal risk is as non-zero as the risk of error in your item
one there.  Put another way:

 1. A failure to be sued might conform to the law,
    because either the person did something legal or
    no one thought to sue.
 2. A lawsuit means that either you went too far out of
    bounds or someone is playing legal poker with you.
 3. A failure to create a test suite means someone 
    realized that one and two were both pretty darned
    risky, or else just too much work, and so they blew
    it off.

In any event, also, I observe the possibility of tests whose 
function is not to "prove conformance" but to "demonstrate
correct behavior on a known set of tests".  By such word games,
you can relax the set of things the test seems to "prove" and
so make it possible for a correct answer to mean "passed" and
not "might have passed" since completing the entire test does
not have the same weight.  That might be a better way to go.

Also, I do think you should create an appeal procedure because
a failure to conform may sometimes be due to lack of foresight
on the part of the designers and (as with a disputed item on a
credit card bill) might wish not to be counted as a black mark
during a time while some sort of mediation was invoked.

However, all that said, I think exactly for the reasons cited
above that you are best staying out of the business of tightly
coupling tests with specifications unless you make the tests
definitional, and at that point you might as well just supply a 
reference implementation instead.

I don't like the idea of screening who can be in this srfi system and
who can't, nor do i like the idea of saying who complies and who
doesn't.  I think it should be simply be a place to register something
for anyone who wants to do it, and that compliance should be judged by
the market and/or by one OR MORE independent (not uniquely determined)
agencies, since I may not trust any one set of tests or testers, and
I'd rather let the marketplace sort out who are good testers than
build that into the process.  There's no obvious reason to believe
that just because someone can write a good spec, that they can also
write a good test.  I'm worried about other good tests not being 
given the same status, and I'm worried about bad tests attached to
good specs getting undue weight.

(It's also funny to have to have me be saying that I'd like to see
less mechanism in the process and to have you guys being the ones
proposing what I see as excess red tape.  Turnabound is fair play, I
guess.)