Re: Eric Whitney's ext4 scaling data

Zheng Liu <gnehzuil.liu@xxxxxxxxx> · Thu, 28 Mar 2013 12:49:05 +0800

[add Eric into cc list]

On Wed, Mar 27, 2013 at 11:10:11AM -0400, Theodore Ts'o wrote:
> On Wed, Mar 27, 2013 at 03:21:02PM +0800, Zheng Liu wrote:
> > 
> > The key issue that we add test case into xfstests is that we need to
> > handle some filesystem-specific feature.  Just like we had discussed
> > with Dave, what is an extent?  IMHO now xfstests gets more compliated
> > because it needs to handle this problem. e.g. punch hole for
> > indirect-based file in ext4.
> 
> Yes, that means among other things the test framework needs to keep
> track of which file system features was being used when we run a
> particular test, as well as the hardware configuration.
> 
> I suspect that what this means is that we're better off trying to
> create a new test framework that does what we want, and automates as
> much of this as possible.

Yes, that means that we need to create a new wheel to do this work.
That is why I want to discuss with other folks because this is not a
small project.

> 
> It would probably be a good idea to bring in Eric Whitney into this
> discussion, since he has a huge amount of expertise about what sort of
> things need to be done in order to get good results.  He was doing a
> number of things by hand, including re-running the tests multiple
> times to make sure the results were stable.  I could imagine that if
> the framework could keep track of what the standard deviation was for
> a particular test, it could try to do this automatically, and then we
> could also throw up a flag if the average result hadn't changed, but
> the standard deviation had increased, since that might be an
> indication that some change had caused a lot more variability.

Average and standard deviation is a very important data for a
performance test framework.  Some performance regressions only causes a
very subtle impact.  This means that we need to run a test case serveral
times, and count average and standard deviation besides throughput,
IOPS, latency, etc....

> 
> (Note by the way that one of the things that is going to be critically
> important for companies using ext4 for web backends is not just the
> average throughput, which is what FFSB mostly tests, but also 99.99%
> percentile latency.  And sometimes the best workloads which show this
> will only be mixed workloads, when under memory pressure.  For
> example, consider the recent "page eviction from the buddy cache"
> e-mail.  That's something which might result in only a slight increase
> for average throughput numbers, but could have a much more profound
> impact on 99.9% latency numbers, especially if while we are reading in
> a bitmap block, we are holding some lock or preventing a journal
> commit from closing.)

Definitely, the latency is very important for us.  At Taobao, most apps
are latency-sensitive.  They expect a stable latency that is provided by
file system.  They can accept that we only provide a stable but high
latency on every writes (e.g. 100ms, quite big :-)) because the designer
will consider this factor.  However, they hate that we provide a small
but unstable latency (e.g. 3ms on 99% writes, and 500ms on 1% write).

Regards,
                                                - Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html