The core of this patchset is the new folio_end_read() call which filesystems can use when finishing a page cache read instead of separate calls to mark the folio uptodate and unlock it. As an illustration of its use, I converted ext4, iomap & mpage; more can be converted. I think that's useful by itself, but the interesting optimisation is that we can implement that with a single XOR instruction that sets the uptodate bit, clears the lock bit, tests the waiter bit and provides a write memory barrier. That removes one memory barrier and one atomic instruction from each page read, which seems worth doing. That's in patch 15. The last two patches could be a separate series, but basically we can do the same thing with the writeback flag that we do with the unlock flag; clear it and test the waiters bit at the same time. I don't have any performance numbers; I'm hoping Nick might provide some since PPC seems particularly unhappy with write-after-write hazards. Matthew Wilcox (Oracle) (17): iomap: Hold state_lock over call to ifs_set_range_uptodate() iomap: Protect read_bytes_pending with the state_lock mm: Add folio_end_read() ext4: Use folio_end_read() buffer: Use folio_end_read() iomap: Use folio_end_read() bitops: Add xor_unlock_is_negative_byte() alpha: Implement xor_unlock_is_negative_byte m68k: Implement xor_unlock_is_negative_byte mips: Implement xor_unlock_is_negative_byte powerpc: Implement arch_xor_unlock_is_negative_byte on 32-bit riscv: Implement xor_unlock_is_negative_byte s390: Implement arch_xor_unlock_is_negative_byte mm: Delete checks for xor_unlock_is_negative_byte() mm: Add folio_xor_flags_has_waiters() mm: Make __end_folio_writeback() return void mm: Use folio_xor_flags_has_waiters() in folio_end_writeback() arch/alpha/include/asm/bitops.h | 20 +++++ arch/m68k/include/asm/bitops.h | 13 ++++ arch/mips/include/asm/bitops.h | 25 +++++- arch/mips/lib/bitops.c | 14 ++++ arch/powerpc/include/asm/bitops.h | 21 ++--- arch/riscv/include/asm/bitops.h | 12 +++ arch/s390/include/asm/bitops.h | 10 +++ arch/x86/include/asm/bitops.h | 11 ++- fs/buffer.c | 16 +--- fs/ext4/readpage.c | 14 +--- fs/iomap/buffered-io.c | 55 ++++++++----- .../asm-generic/bitops/instrumented-lock.h | 28 ++++--- include/asm-generic/bitops/lock.h | 20 +---- include/linux/page-flags.h | 19 +++++ include/linux/pagemap.h | 1 + kernel/kcsan/kcsan_test.c | 9 +-- kernel/kcsan/selftest.c | 9 +-- mm/filemap.c | 77 ++++++++++--------- mm/kasan/kasan_test.c | 8 +- mm/page-writeback.c | 35 ++++----- 20 files changed, 248 insertions(+), 169 deletions(-) -- 2.40.1