Re: [PATCH 0/5] xfs: various log stuff...

Dave Chinner <david@xxxxxxxxxxxxx> · Thu, 4 Feb 2021 08:20:13 +1100

On Mon, Feb 01, 2021 at 12:39:43PM +0000, Christoph Hellwig wrote:
> On Thu, Jan 28, 2021 at 03:41:49PM +1100, Dave Chinner wrote:
> > Hi folks,
> > 
> > Quick patch dump for y'all. A couple of minor cleanups to the
> > log behaviour, a fix for the CIL throttle hang and a couple of
> > patches to rework the cache flushing that journal IO does to reduce
> > the number of cache flushes by a couple of orders of magnitude.
> > 
> > All passes fstests with no regressions, no performance regressions
> > from fsmark, dbench and various fio workloads, some big gains even
> > on fast storage.
> 
> Can you elaborate on the big gains?

See the commit messages. dbench simulates fileserver behaviour with
extremely frequent fsync/->commit_metadata flush pointsi and that
shows gains at high client counts when logbsize=32k. fsmark is a
highly concurrent metadata modification worklaod designed to push
the journal to it's performance and scalability limits, etc, and
that shows 25% gains on logbsize=32k, bringing it up to the same
performance as logbsize=256k on the test machine.

> Workloads for one, but also
> what kind of storage.  For less FUA/flush to matter the device needs
> to have a write cache, which none of the really fast SSDs even has.

The gains are occurring on devices that have volatile caches. 
But that doesn't mean devices that have volatile caches are slow,
just that they can be faster with a better cache flushing strategy.

And yes, as you would expect, I don't see any change in behaviour on
data center SSDs that have no volatile caches because the block
layer elides cache flushes for them anyway.

But, really, device performance improvements really aren't the
motivation for this. The real motivation is removing orders of
magnitude of flush points from the software layers below the
filesystem. Stuff like software RAID, thin provisioning and other
functionality that must obey the flush/fua IOs they receive
regardless of whether the underlying hardware needs them or not.

Avoiding flush/fua for the journal IO means that RAID5/6 can cache
partial stripe writes from the XFS journal rather than having to
flush the partial stripe update for every journal IO. dm-thin
doesn't need to commit open transactions and flush all the dirty
data over newly allocated regions on every journal IO to a device
pool (i.e. cache flushes from one thinp device in a pool cause all
other thinp devices in the pool to stall new allocations until the
flush/fua is done). And so on.

There's no question at all that reducing the number of flush/fua
triggers is a good thing to be doing, regardless of the storage or
workloads that I've done validation testing on. The fact I've found
that on a decent performance SSD (120k randr IOPS, 60k randw IOPS)
shows a 25% increase in performance for journal IO bound workload
indicates just how much default configurations can be bound by the
journal cache flushes...

> So I'd only really expect gains from that on consumer grade SSDs and
> hard drives.

Sure, but those are exactly the devices we have always optimised
cache flushing for....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx