From: Zhang Yi <yi.zhang@xxxxxxxxxx> Changes since v1: - Patch 5 fix a stale data exposure problem pointed out by Willy, drop the setting of uptodate bits after zeroing out unaligned range. - As Dave suggested, in order to prevent increasing the complexity of maintain the state_lock, don't just drop all the state_lock in the buffered write path, patch 6 introduce a new helper to set uptodate bit and dirty bits together under the state_lock, reduce one time of locking per write, the benefits of performance optimization do not change too much. This series contains some minor non-critical fixes and performance improvements on the filesystem with block size < folio size. The first 4 patches fix the handling of setting and clearing folio ifs dirty bits when mark the folio dirty and when invalidat the folio. Although none of these code mistakes caused a real problem now, it's still deserve a fix to correct the behavior. The second 2 patches drop the unnecessary state_lock in ifs when setting and clearing dirty/uptodate bits in the buffered write path, it could improve some (~8% on my machine) buffer write performance. I tested it through UnixBench on my x86_64 (Xeon Gold 6151) and arm64 (Kunpeng-920) virtual machine with 50GB ramdisk and xfs filesystem, the results shows below. UnixBench test cmd: ./Run -i 1 -c 1 fstime-w Before: x86 File Write 1024 bufsize 2000 maxblocks 524708.0 KBps arm64 File Write 1024 bufsize 2000 maxblocks 801965.0 KBps After: x86 File Write 1024 bufsize 2000 maxblocks 569218.0 KBps arm64 File Write 1024 bufsize 2000 maxblocks 871605.0 KBps Thanks, Yi. Zhang Yi (6): iomap: correct the range of a partial dirty clear iomap: support invalidating partial folios iomap: advance the ifs allocation if we have more than one blocks per folio iomap: correct the dirty length in page mkwrite iomap: don't mark blocks uptodate after partial zeroing iomap: reduce unnecessary state_lock when setting ifs uptodate and dirty bits fs/iomap/buffered-io.c | 73 ++++++++++++++++++++++++++++++++++-------- 1 file changed, 60 insertions(+), 13 deletions(-) -- 2.39.2