On Mon, Feb 01, 2021 at 12:39:43PM +0000, Christoph Hellwig wrote: > On Thu, Jan 28, 2021 at 03:41:49PM +1100, Dave Chinner wrote: > > Hi folks, > > > > Quick patch dump for y'all. A couple of minor cleanups to the > > log behaviour, a fix for the CIL throttle hang and a couple of > > patches to rework the cache flushing that journal IO does to reduce > > the number of cache flushes by a couple of orders of magnitude. > > > > All passes fstests with no regressions, no performance regressions > > from fsmark, dbench and various fio workloads, some big gains even > > on fast storage. > > Can you elaborate on the big gains? See the commit messages. dbench simulates fileserver behaviour with extremely frequent fsync/->commit_metadata flush pointsi and that shows gains at high client counts when logbsize=32k. fsmark is a highly concurrent metadata modification worklaod designed to push the journal to it's performance and scalability limits, etc, and that shows 25% gains on logbsize=32k, bringing it up to the same performance as logbsize=256k on the test machine. > Workloads for one, but also > what kind of storage. For less FUA/flush to matter the device needs > to have a write cache, which none of the really fast SSDs even has. The gains are occurring on devices that have volatile caches. But that doesn't mean devices that have volatile caches are slow, just that they can be faster with a better cache flushing strategy. And yes, as you would expect, I don't see any change in behaviour on data center SSDs that have no volatile caches because the block layer elides cache flushes for them anyway. But, really, device performance improvements really aren't the motivation for this. The real motivation is removing orders of magnitude of flush points from the software layers below the filesystem. Stuff like software RAID, thin provisioning and other functionality that must obey the flush/fua IOs they receive regardless of whether the underlying hardware needs them or not. Avoiding flush/fua for the journal IO means that RAID5/6 can cache partial stripe writes from the XFS journal rather than having to flush the partial stripe update for every journal IO. dm-thin doesn't need to commit open transactions and flush all the dirty data over newly allocated regions on every journal IO to a device pool (i.e. cache flushes from one thinp device in a pool cause all other thinp devices in the pool to stall new allocations until the flush/fua is done). And so on. There's no question at all that reducing the number of flush/fua triggers is a good thing to be doing, regardless of the storage or workloads that I've done validation testing on. The fact I've found that on a decent performance SSD (120k randr IOPS, 60k randw IOPS) shows a 25% increase in performance for journal IO bound workload indicates just how much default configurations can be bound by the journal cache flushes... > So I'd only really expect gains from that on consumer grade SSDs and > hard drives. Sure, but those are exactly the devices we have always optimised cache flushing for.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx