Mauve wishlist

david.gilbert at object-refinery.com (David Gilbert) · Tue Mar 21 16:42:07 2006

Hi,

Anthony Balkissoon wrote:

>On Fri, 2006-03-17 at 11:32 -0500, Thomas Fitzsimmons wrote:
>  
>
>>Hi,
>>
>>Anthony Balkissoon has expressed interest in improving Mauve so we'd
>>like to know what would be the best things to work on.
>>
>>    
>>
>
>Another suggestion that Tom Fitzsimmons had was to change the way we
>count the number of tests.  Counting each invocation of the test()
>method rather than each call to harness.check() has two benefits:
>  
>
I think that would be a backward step (I like the detail that Mauve 
provides, especially when testing on subsets while developing on GNU 
Classpath). 

On the other hand, you can achieve this result without losing the 
current detail - for example, see my recent JUnit patch (not committed 
yet) - it effectively gives a pass/fail per test() call when you run via 
JUnit, without losing the ability to run in the usual Mauve way 
(counting check() results).

>1) constant number of tests, regardless of exceptions being thrown or
>which if-else branch is taken
>  
>
Mauve does have a design flaw where it can be tricky to automatically 
assign a unique identifier to each check(), and this makes it hard to 
compare two Mauve runs (say a test of the latest Classpath CVS vs the 
last release, or the Classpath vs JDK 1.5 - both of which would be 
interesting).

We can work around that by ensuring that all the tests run linearly (no 
if-else branches - I've written a large number of tests this way and not 
found it to be a limitation, but I don't know what lurks in the depths 
of the older Mauve tests). 

There is still the problem that an exception being thrown during a test 
means some checks don't get run, but a new Mauve comparison report (not 
yet developed, although I've done a little experimenting with it) could 
highlight those.

>2) more realistic number of tests, to accurately reflect the extent of
>our testing
>  
>
I think the absolute number is meaningless however you count the tests, 
so I don't see this as an advantage.  Test coverage reports are what we 
need to get some insight into the extent of our testing.

>For point 1) this will help us see if we are making progress.  Right now
>a Mauve run might say we have 113 fails out of 13200 tests and then a
>later run could say 200 fails out of 34000 tests.  Is this an
>improvement?  Hard to say.  
>
I have done a little bit of work on a comparison report to show the 
differences between two runs of the same set of Mauve tests, classifying 
them as follows:

Type 1 (Normal):  Passes on run A and B;
Type 2 (Regression):   Passes on run A, fails on run B;
Type 3 (Improvement):  Fails on run A, passes on run B;
Type 4 (Bad): Fails on run A, fails on run B;

In a comparison of JDK1.5 vs Classpath, Type 4 hints that the check is 
buggy.  This is a work in progress, and I don't have any code to show 
anyone yet, but it is an approach that I think can be made to work.

To make it work, each check has to be uniquely identified - I did this 
using the checkpoint and check index within a test(), so here it is 
important that if-else branches in the tests can't result in checks 
being skipped.  This is the case for most of the javax.swing.* tests, 
but I can't speak for some of the older Mauve tests.

>But if we count each call to test() as a
>test, and also detect hanging tests, then we should have a constant
>number of tests in each run and will be able to say if changes made have
>a positive impact on Mauve test results.  
>
You'll lose the ability to distinguish between an existing failure where 
(say) 1 out of 72 checks fail, and after some clever patch 43 out of 72 
checks fail, but the new system reports both as 1 test failure.

Regards,

Dave