Re: [PATCH v1 1/1] xarray: fix the data-race in xas_find_chunk() by using READ_ONCE()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri 06-10-23 16:39:54, Mirsad Todorovac wrote:
> On 9/19/2023 6:20 AM, Matthew Wilcox wrote:
> > On Mon, Sep 18, 2023 at 11:56:36AM -0700, Yury Norov wrote:
> > > Guys, I lost the track of the conversation. In the other email Mirsad
> > > said:
> > >          Which was the basic reason in the first place for all this, because something changed
> > >          data from underneath our fingers ..
> > > 
> > > It sounds clearly to me that this is a bug in xarray, *revealed* by
> > > find_next_bit() function. But later in discussion you're trying to 'fix'
> > > find_*_bit(), like if find_bit() corrupted the bitmap, but it's not.
> > 
> > No, you're really confused.  That happens.
> > 
> > KCSAN is looking for concurrency bugs.  That is, does another thread
> > mutate the data "while" we're reading it.  It does that by reading
> > the data, delaying for a few instructions and reading it again.  If it
> > changed, clearly there's a race.  That does not mean there's a bug!
> > 
> > Some races are innocuous.  Many races are innocuous!  The problem is
> > that compilers sometimes get overly clever and don't do the obvious
> > thing you ask them to do.  READ_ONCE() serves two functions here;
> > one is that it tells the compiler not to try anything fancy, and
> > the other is that it tells KCSAN to not bother instrumenting this
> > load; no load-delay-reload.
> > 
> > > In previous email Jan said:
> > >          for any sane compiler the generated assembly with & without READ_ONCE()
> > >          will be exactly the same.
> > > 
> > > If the code generated with and without READ_ONCE() is the same, the
> > > behavior would be the same, right? If you see the difference, the code
> > > should differ.
> > 
> > Hopefully now you understand why this argument is wrong ...
> > 
> > > You say that READ_ONCE() in find_bit() 'fixes' 200 KCSAN BUG warnings. To
> > > me it sounds like hiding the problems instead of fixing. If there's a race
> > > between writing and reading bitmaps, it should be fixed properly by
> > > adding an appropriate serialization mechanism. Shutting Kcsan up with
> > > READ_ONCE() here and there is exactly the opposite path to the right direction.
> > 
> > Counterpoint: generally bitmaps are modified with set_bit() which
> > actually is atomic.  We define so many bitmap things as being atomic
> > already, it doesn't feel like making find_bit() "must be protected"
> > as a useful use of time.
> > 
> > But hey, maybe I'm wrong.  Mirsad, can you send Yury the bug reports
> > for find_bit and friends, and Yury can take the time to dig through them
> > and see if there are any real races in that mess?
> > 
> > > Every READ_ONCE must be paired with WRITE_ONCE, just like atomic
> > > reads/writes or spin locks/unlocks. Having that in mind, adding
> > > READ_ONCE() in find_bit() requires adding it to every bitmap function
> > > out there. And this is, as I said before, would be an overhead for
> > > most users.
> > 
> > I don't believe you.  Telling the compiler to stop trying to be clever
> > rarely results in a performance loss.
> 
> Hi Mr. Wilcox,
> 
> Do you think we should submit a formal patch for this data-race?

So I did some benchmarking with various GCC versions and the truth is that
READ_ONCE() does affect code generation a bit (although the original code
does not refetch the value from memory). As a result my benchmarks show the
bit searching functions are about 2% slower. This is not much but it is
stupid to cause a performance regression due to non-issue. I'm trying to
get some compiler guys look into this whether we can improve it somehow...

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux