Re: [PATCH v1 1/1] xarray: fix the data-race in xas_find_chunk() by using READ_ONCE()

Mirsad Todorovac <mirsad.todorovac@xxxxxxxxxxxx> · Thu, 12 Oct 2023 00:09:27 +0200

On 10/9/23 12:15, Jan Kara wrote:
On Fri 06-10-23 16:39:54, Mirsad Todorovac wrote:
On 9/19/2023 6:20 AM, Matthew Wilcox wrote:
On Mon, Sep 18, 2023 at 11:56:36AM -0700, Yury Norov wrote:
Guys, I lost the track of the conversation. In the other email Mirsad
said:
          Which was the basic reason in the first place for all this, because something changed
          data from underneath our fingers ..

It sounds clearly to me that this is a bug in xarray, *revealed* by
find_next_bit() function. But later in discussion you're trying to 'fix'
find_*_bit(), like if find_bit() corrupted the bitmap, but it's not.

No, you're really confused.  That happens.

KCSAN is looking for concurrency bugs.  That is, does another thread
mutate the data "while" we're reading it.  It does that by reading
the data, delaying for a few instructions and reading it again.  If it
changed, clearly there's a race.  That does not mean there's a bug!

Some races are innocuous.  Many races are innocuous!  The problem is
that compilers sometimes get overly clever and don't do the obvious
thing you ask them to do.  READ_ONCE() serves two functions here;
one is that it tells the compiler not to try anything fancy, and
the other is that it tells KCSAN to not bother instrumenting this
load; no load-delay-reload.

In previous email Jan said:
          for any sane compiler the generated assembly with & without READ_ONCE()
          will be exactly the same.

If the code generated with and without READ_ONCE() is the same, the
behavior would be the same, right? If you see the difference, the code
should differ.

Hopefully now you understand why this argument is wrong ...

You say that READ_ONCE() in find_bit() 'fixes' 200 KCSAN BUG warnings. To
me it sounds like hiding the problems instead of fixing. If there's a race
between writing and reading bitmaps, it should be fixed properly by
adding an appropriate serialization mechanism. Shutting Kcsan up with
READ_ONCE() here and there is exactly the opposite path to the right direction.

Counterpoint: generally bitmaps are modified with set_bit() which
actually is atomic.  We define so many bitmap things as being atomic
already, it doesn't feel like making find_bit() "must be protected"
as a useful use of time.

But hey, maybe I'm wrong.  Mirsad, can you send Yury the bug reports
for find_bit and friends, and Yury can take the time to dig through them
and see if there are any real races in that mess?

Every READ_ONCE must be paired with WRITE_ONCE, just like atomic
reads/writes or spin locks/unlocks. Having that in mind, adding
READ_ONCE() in find_bit() requires adding it to every bitmap function
out there. And this is, as I said before, would be an overhead for
most users.

I don't believe you.  Telling the compiler to stop trying to be clever
rarely results in a performance loss.

Hi Mr. Wilcox,

Do you think we should submit a formal patch for this data-race?

So I did some benchmarking with various GCC versions and the truth is that
READ_ONCE() does affect code generation a bit (although the original code
does not refetch the value from memory). As a result my benchmarks show the
bit searching functions are about 2% slower. This is not much but it is
stupid to cause a performance regression due to non-issue. I'm trying to
get some compiler guys look into this whether we can improve it somehow...

								Honza

Dear Jan,

First, I am not an expert or an authority on the subject, this is only
my opinion.

IMHO, 2% slower code is acceptable if it gives us data integrity. If a
16-core system manages to break and tear loads without READ_ONCE(), 2%
faster code gives us nothing if the other core changes half of the location
in the midst of the load, just because the optimiser did some "funny stuff".

If I had a pacemaker and it is running Linux kernel, I would probably choose
2% slower but race-free code.

Please allow me to assert that this is not a spin lock, memory bus lock,
or a memory barrier that would affect the other cores - it will only slightly
prevent some read reordering/tearing.

I think you are on the good track, and that this patch is a good thing.

Low-level functions have to be first safe, then fast.

A faster algorithm, like replacing spinlocks with RCU, can certainly more
than make up for that ...

Sorry for a digression.

Best regards,
Mirsad