Re: [PATCH 2/2] fstests: run xfs_io as multi threaded process

Amir Goldstein <amir73il@xxxxxxxxx> · Sat, 15 Oct 2016 23:59:22 +0300

On Sat, Oct 15, 2016 at 8:04 PM, Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:
> On Sat, Oct 15, 2016 at 06:13:29PM +0300, Amir Goldstein wrote:
>> I can't say that I have made a statistics analysis of the affect of the flag
>> on xfstests runtime, but for the -g quick group on small SSD partition,
>> I did not observe any noticable difference in runtime.
>>
>> I will try to run some micro benchmarks or look for specific tests that
>> do many file opens and little io, to get more performance numbers.
>

Here goes.
I ran a simple micro benchmark of running 'xfs_io -c quit' 1000 times
with and without -M flag and the -M flags adds 0.1sec (pthread_ctreate
I suppose)

Looked for a test that runs a lot of xfs_io. found generic/032, which runs
xfs_io 1700 times, mostly for pwrite. This is not a CPU intensive
test, but there is
an avg. runtime difference of +0.2sec for -M flag (out of 8sec).

Taking a look at the runtime difference of entire -g quick did not yield any
obvious changes, all reported runtimes were within the +/-1sec margin,
some were clearly noise as the tests where not running xfs_io at all.

Still I looked closer for tests that do a lot of small read/writes and I found
generic/130, which does many small preads, but from few xfs_io runs.
This is a more CPU intensive test.
There is an avg. runtime difference of +0.3sec for -M flag (out of 4sec).

So far so good, but then I looked closer at its sister test
generic/132, which is
an even more CPU intensive test, also of many small reads and writes
from few xfs_io runs.
This is not a 'quick' group test.
Here the runtime difference was significant 17sec without -M and 20sec
with -M flag.

So without looking much closer into other non quick tests, I think
that perhaps the
best value option is to turn on -M flag for all the quick tests.

What do you think?

> Yes, if there is no effect at least that's not a problem.  I'd just want
> confirmation for that.  In the end we probably don't use xfs_io heavily
> parallel on the same fd a lot.

So there is an effect on specific tests that end up calling fdget() a
lot compared
to the amount of io they generate, but I don't think that we have to
use xfs_io in
parallel on the same fd to see the regression.
The fast path optimization for single threaded process avoids the
rcu_read_lock()
in __fget() altogether and with multi threaded process we take the
rcu_read_lock()
and other stuff even though we are the only process using this fd.

This is just my speculation as I did not run perf analysis on those
fdget intensive tests.
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html