Re: [syzbot] KCSAN: data-race in start_this_handle / start_this_handle

Marco Elver <elver@xxxxxxxxxx> · Thu, 11 Mar 2021 16:54:43 +0100

+Cc Paul

On Thu, 11 Mar 2021 at 16:30, Theodore Ts'o <tytso@xxxxxxx> wrote:
>
> On Thu, Mar 11, 2021 at 04:08:30PM +0100, Marco Elver wrote:
> > If the outcome of the check does not affect correctness and the code is
> > entirely fault tolerant to the precise value being read, then a
> > data_race(!journal->j_running_transaction) marking here would be fine.
>
> So a very common coding pattern is to check a value w/o the lock, and
> if it looks like we might need to check *with* a lock, we'll grab the
> lock and recheck.  Does KCSAN understand that this sort of thing is
> safe automatically?

It doesn't -- these are concurrency patterns that go way beyond the
scope of a data race detector. The main problem, as always with such
patterns, is that in one case it seems fine, yet in the next it isn't.

> In thie particular case, it's a bit more complicated than that; we're
> checking a value, and then allocating memory, grabbing the spin lock,
> and then re-checking the value, so we don't have to drop the spinlock,
> allocate the memory, grab the lock again, and then rechecking the
> value.  So even if KCSAN catches the simpler case as described above,
> we still might need to explicitly mark the data_race explicitly.

This seems like a variation of double-checked locking.

> But the more we could have the compiler automatically figure out
> things without needing an explicit tag, it would seem to me that this
> would be better, since manual tagging is going to be more error-prone.

What you're alluding to here would go much further than a data race
detector ("data race" is still just defined by the memory model). The
wish that there was a static analysis tool that would automatically
understand the "concurrency semantics as intended by the developer" is
something that'd be nice to have, but just doesn't seem realistic.
Because how can a tool tell what the developer intended, without input
from that developer?

If there's worry marking accesses is error-prone, then that might be a
signal that the concurrency design is too complex (or the developer
hasn't considered all cases).

For that reason, we need to mark accesses to tell the compiler and
tooling where to expect concurrency, so that 1) the compiler generates
correct code, and 2) tooling such as KCSAN can double-check what the
developer intended is actually what's happening.

Thanks,
-- Marco