On Mon, Aug 12, 2024 at 08:11:53PM +0800, Zhang Yi wrote: > From: Zhang Yi <yi.zhang@xxxxxxxxxx> > > Changes since v1: > - Patch 5 fix a stale data exposure problem pointed out by Willy, drop > the setting of uptodate bits after zeroing out unaligned range. > - As Dave suggested, in order to prevent increasing the complexity of > maintain the state_lock, don't just drop all the state_lock in the > buffered write path, patch 6 introduce a new helper to set uptodate > bit and dirty bits together under the state_lock, reduce one time of > locking per write, the benefits of performance optimization do not > change too much. It's helpful to provide a lore link to the previous version so that reviewers don't have to go looking for it themselves to remind them of what was discussed last time. https://lore.kernel.org/linux-xfs/20240731091305.2896873-1-yi.zhang@xxxxxxxxxxxxxxx/T/ > This series contains some minor non-critical fixes and performance > improvements on the filesystem with block size < folio size. > > The first 4 patches fix the handling of setting and clearing folio ifs > dirty bits when mark the folio dirty and when invalidat the folio. > Although none of these code mistakes caused a real problem now, it's > still deserve a fix to correct the behavior. > > The second 2 patches drop the unnecessary state_lock in ifs when setting > and clearing dirty/uptodate bits in the buffered write path, it could > improve some (~8% on my machine) buffer write performance. I tested it > through UnixBench on my x86_64 (Xeon Gold 6151) and arm64 (Kunpeng-920) > virtual machine with 50GB ramdisk and xfs filesystem, the results shows > below. > > UnixBench test cmd: > ./Run -i 1 -c 1 fstime-w > > Before: > x86 File Write 1024 bufsize 2000 maxblocks 524708.0 KBps > arm64 File Write 1024 bufsize 2000 maxblocks 801965.0 KBps > > After: > x86 File Write 1024 bufsize 2000 maxblocks 569218.0 KBps > arm64 File Write 1024 bufsize 2000 maxblocks 871605.0 KBps Those are the same performance numbers as you posted for the previous version of the patch. How does this new version perform given that it's a complete rework of the optimisation? It's important to know if the changes made actually provided the benefit we expected them to make.... i.e. this is the sort of table of results I'd like to see provided: platform base v1 v2 x86 524708.0 569218.0 ???? arm64 801965.0 871605.0 ???? -Dave. -- Dave Chinner david@xxxxxxxxxxxxx