On Wed, Aug 14, 2024 at 10:14:01AM +0800, Zhang Yi wrote: > On 2024/8/14 9:49, Dave Chinner wrote: > > important to know if the changes made actually provided the benefit > > we expected them to make.... > > > > i.e. this is the sort of table of results I'd like to see provided: > > > > platform base v1 v2 > > x86 524708.0 569218.0 ???? > > arm64 801965.0 871605.0 ???? > > > > platform base v1 v2 > x86 524708.0 571315.0 569218.0 > arm64 801965.0 876077.0 871605.0 So avoiding the lock cycle in iomap_write_begin() (in patch 5) in this partial block write workload made no difference to performance at all, and removing a lock cycle in iomap_write_end provided all that gain? Is this an overwrite workload or a file extending workload? The result implies that iomap_block_needs_zeroing() is returning false, hence it's an overwrite workload and it's reading partial blocks from disk. i.e. it is doing synchronous RMW cycles from the ramdisk and so still calling the uptodate bitmap update function rather than hitting the zeroing case and skipping it. Hence I'm just trying to understand what the test is doing because that tells me what the result should be... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx