Re: Regression in XFS for fsync heavy workload

Dave Chinner <david@xxxxxxxxxxxxx> · Thu, 17 Mar 2022 10:26:36 +1100

On Wed, Mar 16, 2022 at 08:38:40PM +0100, Jan Kara wrote:
> On Wed 16-03-22 11:09:34, Jan Kara wrote:
> > On Wed 16-03-22 18:44:59, Dave Chinner wrote:
> > > On Wed, Mar 16, 2022 at 12:06:27PM +1100, Dave Chinner wrote:
> > > > On Tue, Mar 15, 2022 at 01:49:43PM +0100, Jan Kara wrote:
> > > > > Hello,
> > > > > 
> > > > > I was tracking down a regression in dbench workload on XFS we have
> > > > > identified during our performance testing. These are results from one of
> > > > > our test machine (server with 64GB of RAM, 48 CPUs, SATA SSD for the test
> > > > > disk):
> > > > > 
> > > > > 			       good		       bad
> > > > > Amean     1        64.29 (   0.00%)       73.11 * -13.70%*
> > > > > Amean     2        84.71 (   0.00%)       98.05 * -15.75%*
> > > > > Amean     4       146.97 (   0.00%)      148.29 *  -0.90%*
> > > > > Amean     8       252.94 (   0.00%)      254.91 *  -0.78%*
> > > > > Amean     16      454.79 (   0.00%)      456.70 *  -0.42%*
> > > > > Amean     32      858.84 (   0.00%)      857.74 (   0.13%)
> > > > > Amean     64     1828.72 (   0.00%)     1865.99 *  -2.04%*
> > > > > 
> > > > > Note that the numbers are actually times to complete workload, not
> > > > > traditional dbench throughput numbers so lower is better.
> > > ....
> > > 
> > > > > This should still
> > > > > submit it rather early to provide the latency advantage. Otherwise postpone
> > > > > the flush to the moment we know we are going to flush the iclog to save
> > > > > pointless flushes. But we would have to record whether the flush happened
> > > > > or not in the iclog and it would all get a bit hairy...
> > > > 
> > > > I think we can just set the NEED_FLUSH flag appropriately.
> > > > 
> > > > However, given all this, I'm wondering if the async cache flush was
> > > > really a case of premature optimisation. That is, we don't really
> > > > gain anything by reducing the flush latency of the first iclog write
> > > > wehn we are writing 100-1000 iclogs before the commit record, and it
> > > > can be harmful to some workloads by issuing more flushes than we
> > > > need to.
> > > > 
> > > > So perhaps the right thing to do is just get rid of it and always
> > > > mark the first iclog in a checkpoint as NEED_FLUSH....
> > > 
> > > So I've run some tests on code that does this, and the storage I've
> > > tested it on shows largely no difference in stream CIL commit and
> > > fsync heavy workloads when comparing synv vs as cache flushes. On
> > > set of tests was against high speed NVMe ssds, the other against
> > > old, slower SATA SSDs.
> > > 
> > > Jan, can you run the patch below (against 5.17-rc8) and see what
> > > results you get on your modified dbench test?
> > 
> > Sure, I'll run the test. I forgot to mention that in vanilla upstream kernel
> > I could see the difference in the number of cache flushes caused by the
> > XFS changes but not actual change in dbench numbers (they were still
> > comparable to the bad ones from my test). The XFS change made material
> > difference to dbench performance only together with scheduler / cpuidling /
> > frequency scaling fixes we have in our SLE kernel (I didn't try to pin down
> > which exactly - I guess I can try working around that by using performance
> > cpufreq governor and disabling low cstates so that I can test stock
> > vanilla kernels). Thanks for the patch!
> 
> Yup, so with limiting cstates and performance cpufreq governor I can see
> your patch helps significantly the dbench performance:
> 
>                    5.18-rc8-vanilla       5.18-rc8-patched
> Amean     1        71.22 (   0.00%)       64.94 *   8.81%*
> Amean     2        93.03 (   0.00%)       84.80 *   8.85%*
> Amean     4       150.54 (   0.00%)      137.51 *   8.66%*
> Amean     8       252.53 (   0.00%)      242.24 *   4.08%*
> Amean     16      454.13 (   0.00%)      439.08 *   3.31%*
> Amean     32      835.24 (   0.00%)      829.74 *   0.66%*
> Amean     64     1740.59 (   0.00%)     1686.73 *   3.09%*
> 
> The performance is restored to values before commit bad77c375e8d ("xfs: CIL
> checkpoint flushes caches unconditionally") as well as the number of
> flushes.

OK, good to know, thanks for testing quickly. I'll spin this up into
a proper patch that removes the async flush functionality and
support infrastructure.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx