Re: [PATCH v1 1/1] xarray: fix the data-race in xas_find_chunk() by using READ_ONCE()

Jan Kara <jack@xxxxxxx> · Mon, 18 Sep 2023 17:54:03 +0200

On Mon 18-09-23 07:59:03, Yury Norov wrote:
> On Mon, Sep 18, 2023 at 02:46:02PM +0200, Mirsad Todorovac wrote:
> > --------------------------------------------------------
> >  lib/find_bit.c | 33 +++++++++++++++++----------------
> >  1 file changed, 17 insertions(+), 16 deletions(-)
> > 
> > diff --git a/lib/find_bit.c b/lib/find_bit.c
> > index 32f99e9a670e..56244e4f744e 100644
> > --- a/lib/find_bit.c
> > +++ b/lib/find_bit.c
> > @@ -18,6 +18,7 @@
> >  #include <linux/math.h>
> >  #include <linux/minmax.h>
> >  #include <linux/swab.h>
> > +#include <asm/rwonce.h>
> >  /*
> >   * Common helper for find_bit() function family
> > @@ -98,7 +99,7 @@ out:                                                                          \
> >   */
> >  unsigned long _find_first_bit(const unsigned long *addr, unsigned long size)
> >  {
> > -       return FIND_FIRST_BIT(addr[idx], /* nop */, size);
> > +       return FIND_FIRST_BIT(READ_ONCE(addr[idx]), /* nop */, size);
> >  }
> >  EXPORT_SYMBOL(_find_first_bit);
> >  #endif
> 
> ...
> 
> That doesn't look correct. READ_ONCE() implies that there's another
> thread modifying the bitmap concurrently. This is not the true for
> vast majority of bitmap API users, and I expect that forcing
> READ_ONCE() would affect performance for them.
> 
> Bitmap functions, with a few rare exceptions like set_bit(), are not
> thread-safe and require users to perform locking/synchronization where
> needed.

Well, for xarray the write side is synchronized with a spinlock but the read
side is not (only RCU protected).

> If you really need READ_ONCE, I think it's better to implement a new
> flavor of the function(s) separately, like:
>         find_first_bit_read_once()

So yes, xarray really needs READ_ONCE(). And I don't think READ_ONCE()
imposes any real perfomance overhead in this particular case because for
any sane compiler the generated assembly with & without READ_ONCE() will be
exactly the same. For example I've checked disassembly of _find_next_bit()
using READ_ONCE(). The main loop is:

   0xffffffff815a2b6d <+77>:	inc    %r8
   0xffffffff815a2b70 <+80>:	add    $0x8,%rdx
   0xffffffff815a2b74 <+84>:	mov    %r8,%rcx
   0xffffffff815a2b77 <+87>:	shl    $0x6,%rcx
   0xffffffff815a2b7b <+91>:	cmp    %rcx,%rax
   0xffffffff815a2b7e <+94>:	jbe    0xffffffff815a2b9b <_find_next_bit+123>
   0xffffffff815a2b80 <+96>:	mov    (%rdx),%rcx
   0xffffffff815a2b83 <+99>:	test   %rcx,%rcx
   0xffffffff815a2b86 <+102>:	je     0xffffffff815a2b6d <_find_next_bit+77>
   0xffffffff815a2b88 <+104>:	shl    $0x6,%r8
   0xffffffff815a2b8c <+108>:	tzcnt  %rcx,%rcx

So you can see the value we work with is copied from the address (rdx) into
a register (rcx) and the test and __ffs() happens on a register value and
thus READ_ONCE() has no practical effect. It just prevents the compiler
from doing some stupid de-optimization.

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR