On Fri, Oct 25, 2019 at 09:10:28AM +0200, Gionatan Danti wrote: > On 24/10/19 23:50, Dave Chinner wrote: > > On Wed, Oct 23, 2019 at 11:40:33AM +0200, Gionatan Danti wrote: > > Defaults are for best compatibility and general behaviour, not > > best performance. A log stripe unit of 32kB allows the user to > > configure a logbsize appropriate for their workload, as it supports > > logbsize of 32kB, 64kB, 128kB and 256kB. If we chose 256kB as the > > default log stripe unit, then you have no opportunity to set the > > logbsize appropriately for your workload. > > > > remember, LSU determines how much padding is added to every non-full > > log write - 32kB pads out ot 32kB, 256kB pads out to 256kB. Hence if > > you have a workload that frequnetly writes non-full iclogs (e.g. > > regular fsyncs) then a small LSU results in much better performance > > as there is less padding that needs to be initialised and the IOs > > are much smaller. > > > > Hence for the general case (i.e. what the defaults are aimed at), a > > small LSU is a much better choice. you can still use a large > > logbsize mount option and it will perform identically to a large LSU > > filesystem on full iclog workloads (like the above fsmark workload > > that doesn't use fsync). However, a small LSU is likely to perform > > better over a wider range of workloads and storage than a large LSU, > > and so small LSU is a better choice for the default.... > > Hi Dave, thank you for your explanation. The observed behavior of a large > LSU surely matches what you described - less-than-optimal fsync perf. > > That said, I was wondering why *logbsize* (rather than LSU) has a low > default of 32k (or, better, its default is to match LSU size). The default is to match LSU size, otherwise if LSU is < 32kB (e.g. not set) it will use 32kB. If you try to set a logbsize smaller than the LSU at mount time, it should throw an error. > If I > understand it correctly, a large logbsize (eg: 256k) on top of a small LSU > (32k) would give high performance on both full-log-writes and > partial-log-writes (eg: frequent fsync). Again, it's a trade-off. 256kB iclogs mean that a crash can leave an unrecoverable 2MB hole in the journal, while 32kB iclogs means it's only 256kB. 256kB iclogs mean 2MB of memory usage per filesystem, 32kB is only 256kB. We have users with hundreds of individual XFS filesystems mounted on single machines, and so 256kB iclogs is a lot of wasted memory... On small logs and filesystems, 256kB iclogs doesn't provide any real benefit because throughput is limited by log tail pushing (metadata writeback), not async transaction throughput. It's not uncommon for modern disks to have best throughput and/or lowest latency at IO sizes of 128kB or smaller. If you have lots of NVRAM in front of your spinning disks, then log IO sizes mostly don't matter - they end up bandwidth limited before the iclog size is an issue. Testing on a pristine filesystem doesn't show what happens as the filesystem ages over years of constant use, and so what provides "best performance on empty filesystem" often doesn't provide best long term production performance. And so on. Storage is complex, filesystems are complex, and no one setting is right for everyone. The defaults are intended to be "good enough" in the majority of typical user configs. > Is my understanding correct? For you're specific storage setup, yes. > If you, do you suggest to always set logbsize > to the maximum supported value? No. I recommend that people use the defaults, and only if there are performance issues with their -actual production workload- should they consider changing anything. Benchmarks rarely match the behaviour of production workloads - tuning for benchmarks can actively harm production performance, especially over the long term... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx