On 2024/8/14 13:16, Dave Chinner wrote: > On Wed, Aug 14, 2024 at 11:57:03AM +0800, Zhang Yi wrote: >> On 2024/8/14 10:47, Dave Chinner wrote: >>> On Wed, Aug 14, 2024 at 10:14:01AM +0800, Zhang Yi wrote: >>>> On 2024/8/14 9:49, Dave Chinner wrote: >>>>> important to know if the changes made actually provided the benefit >>>>> we expected them to make.... >>>>> >>>>> i.e. this is the sort of table of results I'd like to see provided: >>>>> >>>>> platform base v1 v2 >>>>> x86 524708.0 569218.0 ???? >>>>> arm64 801965.0 871605.0 ???? >>>>> >>>> >>>> platform base v1 v2 >>>> x86 524708.0 571315.0 569218.0 >>>> arm64 801965.0 876077.0 871605.0 >>> >>> So avoiding the lock cycle in iomap_write_begin() (in patch 5) in >>> this partial block write workload made no difference to performance >>> at all, and removing a lock cycle in iomap_write_end provided all >>> that gain? >> >> Yes. >> >>> >>> Is this an overwrite workload or a file extending workload? The >>> result implies that iomap_block_needs_zeroing() is returning false, >>> hence it's an overwrite workload and it's reading partial blocks >>> from disk. i.e. it is doing synchronous RMW cycles from the ramdisk >>> and so still calling the uptodate bitmap update function rather than >>> hitting the zeroing case and skipping it. >>> >>> Hence I'm just trying to understand what the test is doing because >>> that tells me what the result should be... >>> >> >> I forgot to mentioned that I test this on xfs with 1K block size, this >> is a simple case of block size < folio size that I can direct use >> UnixBench. > > OK. So it's an even more highly contrived microbenchmark than I > thought. :/ > > What is the impact on a 4kB block size filesystem running that same > 1kB write test? That's going to be a far more common thing to occur > in production machines for such small IO, Yeah, I agree with you, the original test case I want to test is buffered overwrite with bs=4K to the 4KB filesystem which has existing larger size folios (> 4KB), this is one kind of common case of block size < folio size after large folio is enabled. But I don't find a benchmark tool can do this test easily, so I use the above tests parameters to simulate this case. > let's make sure that we > haven't regressed that case in optimising for this one. Sure, I will test this case either. > >> This test first do buffered append write with bs=1K,count=2000 in the >> first round, and then do overwrite from the start position with the same >> parameters repetitively in 30 seconds. All the write operations are >> block size aligned, so iomap_write_begin() just continue after >> iomap_adjust_read_range(), don't call iomap_set_range_uptodate() to set >> range uptodate originally, hence there is no difference whether with or >> without patch 5 in this test case. > > Ok, so you really need to come up with an equivalent test that > exercises the paths that patch 5 modifies, because right now we have > no real idea of what the impact of that change will be... > Sure. Thanks, Yi.