Re: [syzbot] [mm?] KCSAN: data-race in mprotect_fixup / try_to_migrate_one

Marco Elver <elver@xxxxxxxxxx> · Wed, 5 Feb 2025 17:25:16 +0100

On Wed, 5 Feb 2025 at 16:51, Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> wrote:
[...]
> > [...]
> > > I hate that we have these landmines waiting for us. Be good to find a way
> > > to explicitly annotate this, or at least comment somehow.
> > >
> > > But agreed, probably adding a READ_ONCE()/WRITE_ONCE() is appropriate at
> > > least for the proximate thing.
> > >
> > > It's a wonder these things don't trigger more, except you need probably
> > > very precise timing to do it...
> >
> > They do trigger, but we don't send all of them to LKML.
> > When we first introduced KCSAN, the notion of "data race" was still
> > poorly understood. At the time we decided to pre-review a number of
> > them (but our time to do so has been going down :-/), or let willing
> > maintainers deal with them directly. A number of articles followed,
>
> We very much appreciate your efforts :)
>
> We are definitely willing to see these in mm, and as you can see from the
> discussion here, the interaction between the rmap locks and other locks is
> complicated (see also the docs I wrote on them at [0]).

Tangentially, I've been trying to work out how to bring this [1] Clang
feature to the kernel: it's more or less a simple "capability system"
[2] to express "acquire this before doing that / don't hold this thing
here / etc.". Locking rules are an obvious application. It's been on a
number of people's radar over the years, but nothing materialized.
Sparse's locking analysis is much weaker, nor easy (i.e. quick) to
use.

[1] https://clang.llvm.org/docs/ThreadSafetyAnalysis.html
[2] https://www.cs.cornell.edu/talc/papers/capabilities.pdf

The current work-in-progress is here:
https://git.kernel.org/pub/scm/linux/kernel/git/melver/linux.git/log/?h=cap-analysis
It lacks documentation, and proper commit messages, but is otherwise
usable (see example enablements for kfence, kcov, and stackdepot and
lib/test_capability-analysis.c).
An official RFC will follow, but the hard part of writing
documentation is in the works. ;-)

There are also other questions, such as:  can a subset of the analysis
be applied tree-wide (vs. current selective enablement), as it would
help find more bugs faster.
However, the reality of it is that using this system would be opting
into a "dialect of C with capability analysis" with its own set of
restrictions, and I don't know if everyone is willing to pay this
cost.

What I'd be curious about is, if some of the complex rules you mention
above can be expressed so that Clang's "capability analysis" can point
out some bugs. I suspect not everything can be expressed, but even if
we get 50% there, we could catch a huge amount of bugs statically at
compile-time.

I let this cat out the bag, because this thread seems like a good way
to get super-early high-level feedback. :-)
It'll be a while before the first RFC.

Thanks,
-- Marco