On Mon, Mar 18, 2024 at 05:24:10PM -0700, Christoph Hellwig wrote: > On Tue, Mar 19, 2024 at 09:45:51AM +1100, Dave Chinner wrote: > > Apart from those small complexities that are resolved by the end of > > the patchset, the conversion and enhancement is relatively straight > > forward. It passes fstests on both 512 and 4096 byte sector size > > storage (512 byte sectors exercise the XBF_KMEM path which has > > non-zero bp->b_offset values) and doesn't appear to cause any > > problems with large 64kB directory buffers on 4kB page machines. > > Just curious, do you have any benchmark numbers to see if this actually > improves performance? I have run some fsmark scalability tests on 64kb directory block sizes to check that nothing fails and the numbers are in the expected ballpark, but I haven't done any specific back to back performance regression testing. The reason for that is two-fold: 1. scalability on 64kb directory buffer workloads is limited by buffer lock latency and journal size. i.e. even a 2GB journal is too small for high concurrency and results in significant amounts of tail pushing and the directory modifications getting stuck on writeback of directory buffers from tail-pushing. 2. relogging 64kB directory blocks is -expensive-. Comapred to a 4kB block size, the large directory block sizes are relogged much more frequently and the memcpy() in each relogging costs *much* more than relogging a 4kB directory block. It also hits xlog_kvmalloc() really hard, and that's now where we hit vmalloc scalalbility issues on large dir block size workloads. The result of these things is that there hasn't been any significant change in performance one way or the other - what we gain in buffer access efficiency, we give back in increased lock contention and tail pushing latency issues... -Dave. -- Dave Chinner david@xxxxxxxxxxxxx