On Mon, Sep 18, 2023 at 06:28:07PM +0200, Mirsad Todorovac wrote: > > > On 9/18/23 17:54, Jan Kara wrote: > > On Mon 18-09-23 07:59:03, Yury Norov wrote: > > > On Mon, Sep 18, 2023 at 02:46:02PM +0200, Mirsad Todorovac wrote: > > > > -------------------------------------------------------- > > > > lib/find_bit.c | 33 +++++++++++++++++---------------- > > > > 1 file changed, 17 insertions(+), 16 deletions(-) > > > > > > > > diff --git a/lib/find_bit.c b/lib/find_bit.c > > > > index 32f99e9a670e..56244e4f744e 100644 > > > > --- a/lib/find_bit.c > > > > +++ b/lib/find_bit.c > > > > @@ -18,6 +18,7 @@ > > > > #include <linux/math.h> > > > > #include <linux/minmax.h> > > > > #include <linux/swab.h> > > > > +#include <asm/rwonce.h> > > > > /* > > > > * Common helper for find_bit() function family > > > > @@ -98,7 +99,7 @@ out: \ > > > > */ > > > > unsigned long _find_first_bit(const unsigned long *addr, unsigned long size) > > > > { > > > > - return FIND_FIRST_BIT(addr[idx], /* nop */, size); > > > > + return FIND_FIRST_BIT(READ_ONCE(addr[idx]), /* nop */, size); > > > > } > > > > EXPORT_SYMBOL(_find_first_bit); > > > > #endif > > > > > > ... > > > > > > That doesn't look correct. READ_ONCE() implies that there's another > > > thread modifying the bitmap concurrently. This is not the true for > > > vast majority of bitmap API users, and I expect that forcing > > > READ_ONCE() would affect performance for them. > > > > > > Bitmap functions, with a few rare exceptions like set_bit(), are not > > > thread-safe and require users to perform locking/synchronization where > > > needed. > > > > Well, for xarray the write side is synchronized with a spinlock but the read > > side is not (only RCU protected). > > > > > If you really need READ_ONCE, I think it's better to implement a new > > > flavor of the function(s) separately, like: > > > find_first_bit_read_once() > > > > So yes, xarray really needs READ_ONCE(). And I don't think READ_ONCE() > > imposes any real perfomance overhead in this particular case because for > > any sane compiler the generated assembly with & without READ_ONCE() will be > > exactly the same. For example I've checked disassembly of _find_next_bit() > > using READ_ONCE(). The main loop is: > > > > 0xffffffff815a2b6d <+77>: inc %r8 > > 0xffffffff815a2b70 <+80>: add $0x8,%rdx > > 0xffffffff815a2b74 <+84>: mov %r8,%rcx > > 0xffffffff815a2b77 <+87>: shl $0x6,%rcx > > 0xffffffff815a2b7b <+91>: cmp %rcx,%rax > > 0xffffffff815a2b7e <+94>: jbe 0xffffffff815a2b9b <_find_next_bit+123> > > 0xffffffff815a2b80 <+96>: mov (%rdx),%rcx > > 0xffffffff815a2b83 <+99>: test %rcx,%rcx > > 0xffffffff815a2b86 <+102>: je 0xffffffff815a2b6d <_find_next_bit+77> > > 0xffffffff815a2b88 <+104>: shl $0x6,%r8 > > 0xffffffff815a2b8c <+108>: tzcnt %rcx,%rcx > > > > So you can see the value we work with is copied from the address (rdx) into > > a register (rcx) and the test and __ffs() happens on a register value and > > thus READ_ONCE() has no practical effect. It just prevents the compiler > > from doing some stupid de-optimization. > > > > Honza > > If I may also add, centralised READ_ONCE() version had fixed a couple of hundred of > the instances of KCSAN data-races in dmesg. > > _find_*_bit() functions and/or macros cause quite a number of KCSAN BUG warnings: > > 95 _find_first_and_bit (lib/find_bit.c:114 (discriminator 10)) > 31 _find_first_zero_bit (lib/find_bit.c:125 (discriminator 10)) > 173 _find_next_and_bit (lib/find_bit.c:171 (discriminator 2)) > 655 _find_next_bit (lib/find_bit.c:133 (discriminator 2)) > 5 _find_next_zero_bit > > Finding each one find_bit_*() function and replacing it with find_bit_*_read_once() > could be time-consuming and challenging. > > However, I will do both versions so you could compare, if you'd like. > > Note, in the PoC version I have only implemented find_next_bit_read_once() ATM to see if > this works. > > Regards, > Mirsad Guys, I lost the track of the conversation. In the other email Mirsad said: Which was the basic reason in the first place for all this, because something changed data from underneath our fingers .. It sounds clearly to me that this is a bug in xarray, *revealed* by find_next_bit() function. But later in discussion you're trying to 'fix' find_*_bit(), like if find_bit() corrupted the bitmap, but it's not. In previous email Jan said: for any sane compiler the generated assembly with & without READ_ONCE() will be exactly the same. If the code generated with and without READ_ONCE() is the same, the behavior would be the same, right? If you see the difference, the code should differ. You say that READ_ONCE() in find_bit() 'fixes' 200 KCSAN BUG warnings. To me it sounds like hiding the problems instead of fixing. If there's a race between writing and reading bitmaps, it should be fixed properly by adding an appropriate serialization mechanism. Shutting Kcsan up with READ_ONCE() here and there is exactly the opposite path to the right direction. Every READ_ONCE must be paired with WRITE_ONCE, just like atomic reads/writes or spin locks/unlocks. Having that in mind, adding READ_ONCE() in find_bit() requires adding it to every bitmap function out there. And this is, as I said before, would be an overhead for most users.