On 2024/8/14 10:47, Dave Chinner wrote: > On Wed, Aug 14, 2024 at 10:14:01AM +0800, Zhang Yi wrote: >> On 2024/8/14 9:49, Dave Chinner wrote: >>> important to know if the changes made actually provided the benefit >>> we expected them to make.... >>> >>> i.e. this is the sort of table of results I'd like to see provided: >>> >>> platform base v1 v2 >>> x86 524708.0 569218.0 ???? >>> arm64 801965.0 871605.0 ???? >>> >> >> platform base v1 v2 >> x86 524708.0 571315.0 569218.0 >> arm64 801965.0 876077.0 871605.0 > > So avoiding the lock cycle in iomap_write_begin() (in patch 5) in > this partial block write workload made no difference to performance > at all, and removing a lock cycle in iomap_write_end provided all > that gain? Yes. > > Is this an overwrite workload or a file extending workload? The > result implies that iomap_block_needs_zeroing() is returning false, > hence it's an overwrite workload and it's reading partial blocks > from disk. i.e. it is doing synchronous RMW cycles from the ramdisk > and so still calling the uptodate bitmap update function rather than > hitting the zeroing case and skipping it. > > Hence I'm just trying to understand what the test is doing because > that tells me what the result should be... > I forgot to mentioned that I test this on xfs with 1K block size, this is a simple case of block size < folio size that I can direct use UnixBench. This test first do buffered append write with bs=1K,count=2000 in the first round, and then do overwrite from the start position with the same parameters repetitively in 30 seconds. All the write operations are block size aligned, so iomap_write_begin() just continue after iomap_adjust_read_range(), don't call iomap_set_range_uptodate() to set range uptodate originally, hence there is no difference whether with or without patch 5 in this test case. Thanks, Yi.