Re: Regression in XFS for fsync heavy workload

Jan Kara <jack@xxxxxxx> · Thu, 17 Mar 2022 12:56:21 +0100

On Thu 17-03-22 10:38:28, Dave Chinner wrote:
> On Wed, Mar 16, 2022 at 10:54:37AM +0100, Jan Kara wrote:
> > On Wed 16-03-22 12:06:27, Dave Chinner wrote:
> > > When doing this work, I didn't count cache flushes. What I looked at
> > > was the number of log forces vs the number of sleeps waiting on log
> > > forces vs log writes vs the number of stalls waiting for log writes.
> > > These numbers showed improvements across the board, so any increase
> > > in overhead from physical cache flushes was not reflected in the
> > > throughput increases I was measuring at the "fsync drives log
> > > forces" level.
> > 
> > Thanks for detailed explanation! I'd just note that e.g. for a machine with
> > 8 CPUs, 32 GB of Ram and Intel SSD behind a megaraid_sas controller (it is
> > some Dell PowerEdge server) we see even larger regressions like:
> > 
> >                     good                      bad
> > Amean 	1	97.93	( 0.00%)	135.67	( -38.54%)
> > Amean 	2	147.69	( 0.00%)	194.82	( -31.91%)
> > Amean 	4	242.82	( 0.00%)	352.98	( -45.36%)
> > Amean 	8	375.36	( 0.00%)	591.03	( -57.45%)
> > 
> > I didn't investigate on this machine (it was doing some other tests and I
> > had another machine in my hands which also showed some, although smaller,
> > regression) but now reading your explanations I'm curious why the
> > regression grows with number of threads on that machine. Maybe the culprit
> > is different there or just the dynamics isn't as we imagine it on that
> > storage controller... I guess I'll borrow the machine and check it.
> 
> That sounds more like a poor caching implementation in the hardware
> RAID controller than anything else.

Likely. I did a run with your patch on this machine now and original
performance was restored.

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR