On Thu 17-03-22 10:38:28, Dave Chinner wrote: > On Wed, Mar 16, 2022 at 10:54:37AM +0100, Jan Kara wrote: > > On Wed 16-03-22 12:06:27, Dave Chinner wrote: > > > When doing this work, I didn't count cache flushes. What I looked at > > > was the number of log forces vs the number of sleeps waiting on log > > > forces vs log writes vs the number of stalls waiting for log writes. > > > These numbers showed improvements across the board, so any increase > > > in overhead from physical cache flushes was not reflected in the > > > throughput increases I was measuring at the "fsync drives log > > > forces" level. > > > > Thanks for detailed explanation! I'd just note that e.g. for a machine with > > 8 CPUs, 32 GB of Ram and Intel SSD behind a megaraid_sas controller (it is > > some Dell PowerEdge server) we see even larger regressions like: > > > > good bad > > Amean 1 97.93 ( 0.00%) 135.67 ( -38.54%) > > Amean 2 147.69 ( 0.00%) 194.82 ( -31.91%) > > Amean 4 242.82 ( 0.00%) 352.98 ( -45.36%) > > Amean 8 375.36 ( 0.00%) 591.03 ( -57.45%) > > > > I didn't investigate on this machine (it was doing some other tests and I > > had another machine in my hands which also showed some, although smaller, > > regression) but now reading your explanations I'm curious why the > > regression grows with number of threads on that machine. Maybe the culprit > > is different there or just the dynamics isn't as we imagine it on that > > storage controller... I guess I'll borrow the machine and check it. > > That sounds more like a poor caching implementation in the hardware > RAID controller than anything else. Likely. I did a run with your patch on this machine now and original performance was restored. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR