Re: Performance Translators' Stability and Usefulness - Regression test outline

Mickey Mazarick <mic@xxxxxxxxxxxxxxxxxx> · Tue, 07 Jul 2009 17:08:41 -0400

Wow, you really hit my biggest fear, the one thing I try to test for... 
data corruption. 
That's what I wake up afraid of at night...

I'm doing a simplified version of the first set of testing you mentioned 
but nothing as detailed. Really creating a random file and doing an md5 
check on it, but now that you mention all the possabilities of files 
moving in from the back end I'm really doing nothing to test the dht or 
namespace distrobution at all....
I would add a few but I haven't had the time to google how to do the 
following without writing a C prog:
check flock()   
check mmap writing
Also I have yet to get this to work all the time but starting a large 
write, and losing a brick under afr.. (usually term the write)

I'm thinking a simple collection of bash or perl scripts would work for 
a first pass at this. Do you have any suggestions on a good colab site 
for scripting?  If we came up with a basic format we could create and 
then mix and match them as we saw fit. We just need them all to be 
called with the same args, then have a master run that executes all of 
them in a tests dir.  It would also be nice if there was a sort of 
standard output both for giving to devels as well as rolling up nicely 
if we get 1E3 of these things.

-Mic

Geoff Kassel wrote:
Hi Mickey,

Thanks I am well versed in unit testing but probably disagree on level
of use in a development cycle. Instead of writing a long email back
about testing theory, nondeterministic problems, highly connected
dependent systems blah blah

Sorry, I was just trying to make sure we were all on the same page - define 
some common terminology, etc for anyone else who wanted to join in.

I'm well aware of the limits of testing, having most of a PhD in related 
formal methods topics and having taught Uni subjects in this area. (But 
consider me optimistic anyway :)

It's just about improving confidence, after all. Not about achieving some 
nebulous notion of perfection.

I'll just say that most of the problems that 
have plagued me have been because of interactions between translators,
kernel mods etc which unit testing doesn't really approach.

That's the focus of integration testing, not unit tests... I did mention 
integration testing.

Since I'm running my setup as a storage farm it just doesn't matter to
me if there's a memory leak of if a server daemon crashes, I have cron
jobs that restart it and I barely take notice.

You're very lucky that a crash doesn't cause you much annoyance. My annoyances 
in this area are well documented in the list, so I won't repeat them again :)

I would rather encourage the dev team to add hotadd
upgrade and hotadd features. These things would keep my cluster going
even if there were catastrophic problems.

These are good features to have, yes. However, I'd like to make sure there's 
something incorrupted to recover first.

If a feature freeze was necessary to get a proper QA framework put in place 
and working towards avoiding more data corruption bugs, then I would vote for 
the feature freeze over more features, no matter how useful.

What I'm saying is that a good top down testing system is something we
can discuss here, spec out and perhaps create independently of the
development team. I think what most people want is a more stable product
and I think a top down approach will get it there faster than trying to
implement a given UT system from the bottom up. It will defiantly answer
the question "should I upgrade to this release?"

Alright. We'll let the devs concentrate on bottom up testing (they know the 
code better anyway), and we in the wider community can look at top down 
testing.

You mentioned that you had outlines some integration and function tests
previously, perhaps you could paste some into this thread so that we
could expand on them.

Okay. The test I outlined was for checking for data corruption bugs for AFR 
and Unity with cryptographic hashes. The idea actually expands into a class 
of test cases. I'll flesh those out a bit more now.

Generate a number of files of varying length (zero size, single byte, transfer 
block size - 1, transfer block size, transfer block size + 1, multiple meg, 
multiple gig etc) in a directory tree of varying depths. Take the 
cryptographic hash of each file.

One test can be starting with an empty set of GlusterFS back end data blocks. 
Insert the files and directories through the client - check the hashes of the 
files stored on the back ends, and as read back through each of the 
client(s). If the hashes mismatch the original computed hashes at any point, 
the test has failed.

Another test can be starting with the files already on the back end. (But 
without having had Gluster assign metadata attributes yet.) Start the server, 
read the files through each of the client(s) and directly from the back end. 
As before, if the hashes mismatch at any point - failure.

A third test - start another set of back ends with a partially populated back 
end. Start the server, read the existing files off, compare hashes. Add the 
remaining files. Compare the hashes of all files through the client(s), and 
as they end up on the back end.

I don't know if 2.0.x Gluster supports this any more, but you used to be able 
to have one back end populated and the other empty, so long as a namespace 
block on all servers had zero-length file entries for all of the replicated 
files. (This being how you could add a node to your cluster originally.) 
Start back ends in this one populated, others empty configuration - read all 
the files through from a client connected only to a server with an empty back 
end. Check the hashes read through the client, and the hashes of the files 
that end up 'healed' onto the formerly empty back ends.

Then there's a multitude of overwrite tests that could be done in this vein, 
as well as concurrent read and write tests to check atomicity etc.

All these tests could be done under different performance translators, with 
different numbers of servers and clients. All just a matter of different 
configuration files given, and different scripts to set up different test 
environments.

All of these functional tests can be automated, can be done on a single system 
with some clever configuration files, or performed across a network to try to 
detect issues caused by networking.

(I believe there are open source network simulation tools that might be able 
to be used to simulate lag, noise, congestion etc, and so reduce this network 
testing to being run on a single machine. Network simulation is not an area 
of expertise for me, so I don't know how effective or comparable this is to 
the real thing.)

If the files in the tests are algorithmically generated (say, sourced from a 
pseudo random number generator, or the various patterns favoured by memory 
testers), the back end test data sets can be quite small in size.

(Hopefully this will all be small enough to add to the repository without 
adding much bulk to a check out.)

What do you think?

Geoff.

On Wed, 8 Jul 2009, Mickey Mazarick wrote:

Geoff,
Thanks I am well versed in unit testing but probably disagree on level
of use in a development cycle. Instead of writing a long email back
about testing theory, nondeterministic problems, highly connected
dependent systems blah blah I'll just say that most of the problems that
have plagued me have been because of interactions between translators,
kernel mods etc which unit testing doesn't really approach.

Since I'm running my setup as a storage farm it just doesn't matter to
me if there's a memory leak of if a server daemon crashes, I have cron
jobs that restart it and I barely take notice. True a regression testing
would get rid of the memory leak you hate but if they have to start from
the ground up I would rather encourage the dev team to add hotadd
upgrade and hotadd features. These things would keep my cluster going
even if there were catastrophic problems.

What I'm saying is that a good top down testing system is something we
can discuss here, spec out and perhaps create independently of the
development team. I think what most people want is a more stable product
and I think a top down approach will get it there faster than trying to
implement a given UT system from the bottom up. It will defiantly answer
the question "should I upgrade to this release?"

You mentioned that you had outlines some integration and function tests
previously, perhaps you could paste some into this thread so that we
could expand on them.

Thanks!
-Mickey Mazarick

Geoff Kassel wrote:

Hi Mickey,
   Just so that we're all on the same page here - a regression test suite
at its most basic just has to include test cases (i.e. a set of inputs)
that can trigger a previously known fault in the code if that fault is
present. (i.e it can see if the code has 'regressed' into a condition
where a fault is present.)

   What it's also taken to mean (and typically includes) is a set of
tests cases covering corner cases and normal modes of operation, as
expressed in a set of inputs to code paired with a set of expected
outputs that may or may not include error messages.

   Test cases aimed at particular levels of the code have specific
terminology associated with those levels. At the lowest level, the method
level, they're called unit tests. At the module/API level - integration
tests. At the system/user interface level - system aka function aka
functional aka functionality tests.

   When new functionality is introduced or a bug is patched, the
regression test suite (which in the case of unit tests is typically fully
automated) is run to see whether the expected behaviour occurs, and none
of the old faults recur.

   A lot of the tests you've described fall into the category of function
tests - and from my background in automated testing, I know we need a bit
more than that to get the stability and reliability results we want.
(Simply because you cannot test every corner case within a project the
size and complexity of GlusterFS reliably from the command line.)

   Basically, what GlusterFS needs is a fairly even coverage of test
cases at all the levels I've just mentioned.

   What I want to see particularly - and what the devs stated nearly a
year ago was already in existence - is unit tests. Particularly the kind
that can be run automatically.

   This is so that developers (inside the GlusterFS team or otherwise)
can hack on a piece of code to fix a bug or implement new functionality,
then run the unit tests to see that they (mostly likely) haven't caused a
regression with their new code.

   (It's somewhat difficult for outsiders to write unit and integration
tests, because typically only the original developers have the in-depth
knowledge of the expected behaviour of the code in the low level detail
required.)

   Perhaps developed in parallel should be integration and function
tests. Tests like these (I've outlined elsewhere specifically what kind)
would have quite likely picked up the data corruption bugs before they
made their way into the first 2.0.x releases.

   (Pretty much anyone familiar with the goal of the project can write
function tests, documenting in live code their expectations for how the
system should work.)

   Long running stability and load tests like you've proposed are also
kinds of function tests, but without the narrowly defined inputs and
outputs of specific test cases. They're basically the equivalent of mine
shaft canaries - they signal the presence of race conditions, memory
leaks, design flaws, and other subtle issues, but often without specifics
as to what 'killed' the canary. Once the cause is found though, a new,
more specific test case can be added at the appropriate level.

   (Useful, yes, but mostly as a starting point for more intensive QA
efforts.)

   The POSIX compliance tests you mentioned are more traditional function
level tests - but I think the GlusterFS devs have wandered a little away
from full POSIX compliance on some points, so these tests may not be 100%
relevant.

   (This is not necessarily a bad thing - the POSIX standard is
apparently ambiguous at times, and there is some wider community feeling
that improvements to the standard are overdue. And I'm not sure the POSIX
standard was ever written with massively scalable, plugable, distributed
file systems in mind, either :)

   I hope my extremely long winded rant here :) has explained adequately
what I feel GlusterFS needs to have in a regression testing system.

Geoff.

On Tue, 7 Jul 2009, Mickey Mazarick wrote:

What kind of requirements does everyone see as necessary for a
regression test system?
Ultimately the best testing system would use the tracing translator and
be able to run tests and generate traces for any problems that occurs,
giving us something very concrete to provide the developers. That's a
few steps ahead however, initially we should start to outline some must
haves in terms of how a test setup is run. obviously we want something
we can run for many hours or days to test longterm stability, and it
would be nice if there was some central way to spin up new clients to
test reliability under a load.

For basic file operation tests I use the below:
An initial look would be to use some tools like
http://www.ntfs-3g.org/pjd-fstest.html
I've seen it mentioned before but it's a good start to test anything
posix. Here's a simple script that will download and build it if it's
missing, and run a test on a given mount point.

#!/bin/bash
if [ "$#" -lt 1 ]
then
  echo "usage: $0 gluster_mount"
  exit 65
fi
GLUSTER_MOUNT=$1
INSTALL_DIR="/usr"
if [ ! -d $INSTALL_DIR/fstest ]; then
  cd $INSTALL_DIR
  wget http://www.ntfs-3g.org/sw/qa/pjd-fstest-20080816.tgz
  tar -xzf pjd-fstest-20080816.tgz
  mv pjd-fstest-20080816 fstest
  cd fstest
  make
  vi tests/conf
fi
cd $GLUSTER_MOUNT
prove -r $INSTALL_DIR/fstest/

Jacques Mattheij wrote:

hello Anand, Geoff & others,

This pretty much parallels my interaction with the team about a
year ago, lots of really good intentions but no actual follow up.

We agreed that an automated test suite was a must and that a
whole bunch of other things would have to be done to get
glusterfs out of the experimental stage and into production
grade.

It's a real pity because I still feel that glusterfs is one of the
major contenders to become *the* cluster file system.

A lot of community goodwill has been lost, I've kept myself
subscribed to this mailing list because I hoped that at some
point we'd move past this endless cat and mouse game with
stability issues but for some reason that never happend.

Anand, you have a very capable team of developers, you have
a once-in-a-lifetime opportunity to make this happen please
take Geoff's comments to hart and get serious about Q&A and
community support because that is the key to any successful
foss project. Fan that fire and you can't go wrong, lose the
community support and your project might as well be dead.

I realize this may come across as harsh but it is intended to
make it painfully obvious that the most staunch supporters
of glusterfs are getting discouraged and that is a loss no
serious project can afford.

 Jacques

Geoff Kassel wrote:

Hi Anand,
   If you look back through the list archives, no one other than me
replied to the original QA thread where I first posted my patches.
Nor to the Savannah patch tracker thread where I also posted my
patches. (Interesting how those trackers have been disabled now...)

   It took me pressing the issue after discovering yet another bug
that we even started talking about my patches. So yes, my patches
were effectively ignored.

   At the time, you did mention that the code the patches were to be
applied against was being reworked, in addition to your comments
about my code comments.

   I explained the comments as being necessary to avoid the automated
tool flagging potential issues again on reuse of that tool - other
comments for future QA work. There was no follow up on that from you,
nor suggestion on how I might improve these comments to your
standards.

   I continued to supply patches in the Savannah tracker against the
latest stable 1.3 branch - which included some refactoring for your
reworked code, IIRC - for some time after that discussion. All of my
patches were in sync with the code from publically available 1.3
branch repository within days of a new TLA patchset.

   None of these were adopted either.

   I simply ran out of spare time to maintain this patchset, and I
got tired of pressing an issue (QA) that you and the dev team clearly
weren't interested in.

   I don't have the kind of spare time needed to do the sort of
in-depth re-audit your code from scratch (as would be needed) in the
manner that I did back then. So I can't meet your request at this
time, sorry.

   As I've suggested elsewhere, now that you apparently have the
resources for a stand-alone QA team - this team might want to at
least use the tools I've used to generate these patches - RATS and
FlawFinder.

   That way you can generate the kind of QA work I was producing with
the kind of comment style you prefer.

   The only way I can conceive of being able to help now is in
patching individual issues. However, I can really only feasibly do
that with my time constraints if I've got regression tests to make
sure I'm not inadvertently breaking other functionality.

   Hence my continued requests for these.

Geoff.

On Tue, 7 Jul 2009, Anand Avati wrote:

  I've also gone one better than just advice - I've given up
significant
portions of my limited spare time to audit and patch a
not-insignificant
portion of the GlusterFS code, in order to deal with the stability
issues
I and others were encountering. My patches were ignored, on the
grounds
that it contained otherwise unobtrusive comments which were quite
necessary to the audit.

Geoff, we really appreciate your efforts, both on the fronts of your
patch submissions and for voicing your opinions freely. We also
acknowledge the positive intentions behind this thread. As far as
your patch submissions are concerned, there is probably a
misunderstanding. Your patches were not ignored. We do value your
efforts. The patches which you submitted, even at the time of your
submission were not applicable to the codebase.

Patch 1 (in glusterfsd.c) -- this file was reworked and almost
rewritten from scratch to work as both client and server.

Patch 2 (glusterfs-fuse/src/glusterfs.c) -- this module was
reimplemented as a new translator (since a separate client was no
more needed).

Patch 3 (protocol.c) -- with the introduction of non blocking IO and
binary protocol, nothing of this file remained.

What I am hoping to convey is that, the reason your patches did not
make it to the repository was because it needed significant reworking
to even apply. I did indeed comment about code comments of the style
/* FlawFinder: */ but then, that definitely was _not_ the reason they
weren't included. Please understand that nothing was ignored
intentionally.

This being said, I can totally understand the efforts which you have
been putting to maintain patchsets by yourself and keeping them up to
date with the repository. I request you to resubmit them (with git
format-patch) against the HEAD of the repository.

Thanks,
Avati

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxx
http://lists.nongnu.org/mailman/listinfo/gluster-devel

--