Re: Performance testing

Jan Tulak <jtulak@xxxxxxxxxx> · Thu, 25 Sep 2014 17:03:40 +0200

On Thu, 2014-09-18 at 10:36 +1000, Dave Chinner wrote:
> On Wed, Sep 17, 2014 at 10:48:44AM +0200, Jan Tulak wrote:
> > Hi,
> > 
> > I have began to work on some set of performance tests. I think it would
> > be useful to have some standard set, because as far as I know, there is
> > just little of performance testing and every of the few tests someone
> > does is unique. I want to propose my ideas before I start to really
> > write it, to fix possible complications.
> 
> Great idea, but I'm missing some context about your ultimate goal
> here: how is this performance testing going to be used?
> 
> My focus for xfstests is mainly for it to be useful to filesystem
> developers who are developing new features and fixing bugs, so my
> comments come from the point of view of "will it make my life as a
> filesystem developer easier?" rather than a "we have need a
> performance test suites" perspective.

I think there is a place for two kinds of these tests. One is a quick
suite that should not run for more than few minutes and can be used
whenever sending a patch (or on daily basis...) for a quick check, to
catch bad things as soon as possible. 

The other set would be for deeper and more complex and used rather
between versions than single patches. This one maybe could be
independent, but for consistency, I think it is better to keep both sets
together.

> 
> > Mixing performance with regressions tests wouldn't be a good idea, so I
> > thought about creating another category on the main level of tests
> > (something like xfstests/tests/performance). Or it would be better to
> > put it into entirely new directory, like xfstests/performance?
> 
> That depends. What infrastructure do you actually need from the
> xfstests harness? How much commonality are you expecting to use
> here? If there's no commonality (i.e. it's a completely separate set
> of infrastructure that only shares SCRATCH_DEV/SCRATCH_MNT) then I'd
> have to question whether xfstests is the right place for this
> functionality.
> 
> However, if it leverages all the same test template and execution
> methods, then having it as just another test subgroup (i.e. in
> tests/performance) would the right way to approach this.

Yes, this is my intention - use the test template and these things, so
it would look like another test, just it would measure performance.

> > From the beginning there would be some basic test cases, like sync/async
> > read and write. Hopefully more natural cases, like a database server
> > would be added later.
> 
> IMO, if we do add performance tests to xfstests, then the focus
> would definitely need to be on performance regression tests, not
> "performance benchmark" (aka benchmarketing) tests. If you want
> "performance benchmarks" then openbenchmarking.org is probably a
> better place to start as that is what it is designed for and
> already has everything you've mentioned.
> 
> So I'll focus on performance regression testing. Performance
> regression testing involves a lot more than just "run benchmark,
> save and compare results". It's once the "compare results" phase
> says "regression found" that the functionality of the test really
> matters to the filesystem develper. i.e. the tests need to be useful
> for *analysis of the regression*.
> 
> Hence things like "database server benchmark" don't really belong in
> a performance regression test suite because they can't be used to
> isolate regressions. Further, they tend to be susceptible to changes
> in performance being caused by changes outside filesystem and
> storage layers. Hence they lead to wild goose chases more often than
> they point to a real filesystem or IO regression.
> 
This is not needed in the quick suite, but just testing simple
read/write will not find regression appearing during some more
complicated situations. If the only thing changed is the filesystem,
any big difference in results can be attributed to the filesystem
change. Right? 

I do not expect everyone will run this test suite all
day, but it could notice us about regressions between versions of a
filesystem.

> 
> > For the IO testing, I want to use FIO for the
> > specific workflow and eventually iozone for the basic synthetic tests.
> 
> I think that the initial focus for performance regression tests
> would need to be more on simple micro-benchmarks (e.g. read, write,
> create, remove, etc).  I'd much prefer to see simple, targeted
> benchmarks that are easily understood jus tby looking at the
> xfstests code. e.g.  a patchset made unlink go fast, but slowed down
> file create. Or that we sped up single threaded creates, but
> destroyed multithreaded create scalability. Or that we sped up small
> directories at the expense of large directories.  These things
> can all be measured individually (and quickly) and because they
> tend to measure a single aspect of filesystem performance they
> can be used directly for regression analysis.
> 
> Many of these sorts of tests can be written into the existing
> xfstests infrastructure without needing significant external
> dependencies - fio and fsmark cover most of the microbenchmarks that
> would be necessary. I already have quite a few scripts that I use to
> run fsmark tests that could easily be wrapped with xfstests
> templates....

The initial focus should really aim at this, I agree. Creating this
quick and small suite should not take a long time. If you have
something, that could be useful once I will create some kind of template
for performance tests.

> 
> As for IOZone, well, I'd suggest you don't bother with IOZone(*)
> because we can do far better with bash, dd and fio....

Thanks for the info. I did just some brief experiments with IOZone so
far, so these things did eluded me.

> 
> > What I'm not sure is how a comparison between different versions could
> > be done, because I don't see any infrastructure within fstests for
> > cross-version comparison. (What would it do with regression tests
> > anyway...) So I wonder if it should be done in this set at all. So the
> > set would only print the measured values. Some other tool (which can be
> > also included, but is not directly part of the performance tests set)
> > could then be used to compare and/or plot graphs.
> 
> I don't think that storing results long term or comparing results is
> something xfstests should directly care about. It is architected to
> defer that to some external tool for post processing.  i.e. xfstests
> is used to run the tests and generate results, not do long term
> storage or analysis of those results.
> 
> I see no issues with including scripts to do result processing
> across multiple RESULT_DIRs within xfstests itself, but the
> infrastructure still has to be architected so it can be externally
> controllable and usable by external test infrastructure.

I expected something like this, so it shouldn't be a big trouble. What I
see as a good way: at first to create some small tests. Then, once they
works as intended, I can work on the external tool for managing the
results, rather than at first creating the tool. That will also give me
more time to find some good solution. (What I see, there is already some
work with autotest running xfstest, so maybe it will needs just a little
work to add the new tests.)

I hope I answered everything. :-)

> 
> Cheers,
> 
> Dave.
> 
> (*) IOZone is pretty much useless for performance regression
> *detection*, let alone be useful for analysis of regressions.
> Run-to-run variation of +/-10% is not uncommon or unexpected - it
> has a very low precision.
> 
> It requires extremely stale clocks for it's timing to be accurate,
> which means you have to be very careful about the hardware you use.
> This also rules out testing in VMs as the timing is simply too
> variable to be useful for accurate measurement. It also means that
> it's very difficult to reproduce the same results across multiple
> machines.
> 
> Worse is tha fact that it is also extremely sensitive to userspace
> and kernel CPU cache footprint changes. Hence a change that affects
> the CPU cache residency of the IOZone data buffer will have far more
> effect on the result than the actual algorithmic change to
> filesystem and IO subsystem that lead to the CPU cache footprint
> change. Hence the same test on two different machiens that only
> differ by CPU can give very different results - one might say
> "faster", the other can say "slower".
> 
> It's just not a reliable tool for IO performance measurement, which
> is kinda sad because that is it's sole purpose in life.....

Cheers,
Jan

--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html