On Thu, Aug 25, 2022 at 05:03:40PM -0400, Mikulas Patocka wrote: > Here I reworked your patch, so that test_bit_acquire is defined just like > test_bit. There's some code duplication (in > include/asm-generic/bitops/generic-non-atomic.h and in > arch/x86/include/asm/bitops.h), but that duplication exists in the > test_bit function too. > > I tested it on x86-64 and arm64. On x86-64 it generates the "bt" > instruction for variable-bit test and "shr; and $1" for constant bit test. > On arm64 it generates the "ldar" instruction for both constant and > variable bit test. > > For me, the kernel 6.0-rc2 doesn't boot in an arm64 virtual machine at all > (with or without this patch), so I only compile-tested it on arm64. I have > to bisect it. It's working fine for me and I haven't had any other reports that it's not booting. Please could you share some more details about your setup so we can try to reproduce the problem? > From: Mikulas Patocka <mpatocka@xxxxxxxxxx> > > There are several places in the kernel where wait_on_bit is not followed > by a memory barrier (for example, in drivers/md/dm-bufio.c:new_read). On > architectures with weak memory ordering, it may happen that memory > accesses that follow wait_on_bit are reordered before wait_on_bit and they > may return invalid data. > > Fix this class of bugs by introducing a new function "test_bit_acquire" > that works like test_bit, but has acquire memory ordering semantics. > > Signed-off-by: Mikulas Patocka <mpatocka@xxxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx > > arch/x86/include/asm/bitops.h | 13 +++++++++++++ > include/asm-generic/bitops/generic-non-atomic.h | 14 ++++++++++++++ > include/asm-generic/bitops/instrumented-non-atomic.h | 12 ++++++++++++ > include/asm-generic/bitops/non-atomic.h | 1 + > include/asm-generic/bitops/non-instrumented-non-atomic.h | 1 + > include/linux/bitops.h | 1 + > include/linux/buffer_head.h | 2 +- > include/linux/wait_bit.h | 8 ++++---- > kernel/sched/wait_bit.c | 2 +- > 9 files changed, 48 insertions(+), 6 deletions(-) This looks good to me, thanks for doing it! Just one thing that jumped out at me: > Index: linux-2.6/include/linux/buffer_head.h > =================================================================== > --- linux-2.6.orig/include/linux/buffer_head.h > +++ linux-2.6/include/linux/buffer_head.h > @@ -156,7 +156,7 @@ static __always_inline int buffer_uptoda > * make it consistent with folio_test_uptodate > * pairs with smp_mb__before_atomic in set_buffer_uptodate > */ > - return (smp_load_acquire(&bh->b_state) & (1UL << BH_Uptodate)) != 0; > + return test_bit_acquire(BH_Uptodate, &bh->b_state); Do you think it would be worth adding set_bit_release() and then relaxing set_buffer_uptodate() to use that rather than the smp_mb__before_atomic()? Will