On Mon 18-09-23 07:59:03, Yury Norov wrote: > On Mon, Sep 18, 2023 at 02:46:02PM +0200, Mirsad Todorovac wrote: > > -------------------------------------------------------- > > lib/find_bit.c | 33 +++++++++++++++++---------------- > > 1 file changed, 17 insertions(+), 16 deletions(-) > > > > diff --git a/lib/find_bit.c b/lib/find_bit.c > > index 32f99e9a670e..56244e4f744e 100644 > > --- a/lib/find_bit.c > > +++ b/lib/find_bit.c > > @@ -18,6 +18,7 @@ > > #include <linux/math.h> > > #include <linux/minmax.h> > > #include <linux/swab.h> > > +#include <asm/rwonce.h> > > /* > > * Common helper for find_bit() function family > > @@ -98,7 +99,7 @@ out: \ > > */ > > unsigned long _find_first_bit(const unsigned long *addr, unsigned long size) > > { > > - return FIND_FIRST_BIT(addr[idx], /* nop */, size); > > + return FIND_FIRST_BIT(READ_ONCE(addr[idx]), /* nop */, size); > > } > > EXPORT_SYMBOL(_find_first_bit); > > #endif > > ... > > That doesn't look correct. READ_ONCE() implies that there's another > thread modifying the bitmap concurrently. This is not the true for > vast majority of bitmap API users, and I expect that forcing > READ_ONCE() would affect performance for them. > > Bitmap functions, with a few rare exceptions like set_bit(), are not > thread-safe and require users to perform locking/synchronization where > needed. Well, for xarray the write side is synchronized with a spinlock but the read side is not (only RCU protected). > If you really need READ_ONCE, I think it's better to implement a new > flavor of the function(s) separately, like: > find_first_bit_read_once() So yes, xarray really needs READ_ONCE(). And I don't think READ_ONCE() imposes any real perfomance overhead in this particular case because for any sane compiler the generated assembly with & without READ_ONCE() will be exactly the same. For example I've checked disassembly of _find_next_bit() using READ_ONCE(). The main loop is: 0xffffffff815a2b6d <+77>: inc %r8 0xffffffff815a2b70 <+80>: add $0x8,%rdx 0xffffffff815a2b74 <+84>: mov %r8,%rcx 0xffffffff815a2b77 <+87>: shl $0x6,%rcx 0xffffffff815a2b7b <+91>: cmp %rcx,%rax 0xffffffff815a2b7e <+94>: jbe 0xffffffff815a2b9b <_find_next_bit+123> 0xffffffff815a2b80 <+96>: mov (%rdx),%rcx 0xffffffff815a2b83 <+99>: test %rcx,%rcx 0xffffffff815a2b86 <+102>: je 0xffffffff815a2b6d <_find_next_bit+77> 0xffffffff815a2b88 <+104>: shl $0x6,%r8 0xffffffff815a2b8c <+108>: tzcnt %rcx,%rcx So you can see the value we work with is copied from the address (rdx) into a register (rcx) and the test and __ffs() happens on a register value and thus READ_ONCE() has no practical effect. It just prevents the compiler from doing some stupid de-optimization. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR