Re: [syzbot] [mm?] KCSAN: data-race in mprotect_fixup / try_to_migrate_one

"Liam R. Howlett" <Liam.Howlett@xxxxxxxxxx> · Wed, 5 Feb 2025 13:56:39 -0500

* Marco Elver <elver@xxxxxxxxxx> [250205 11:29]:
> On Wed, 5 Feb 2025 at 16:51, Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> wrote:
> [...]
> > > [...]
> > > > I hate that we have these landmines waiting for us. Be good to find a way
> > > > to explicitly annotate this, or at least comment somehow.
> > > >
> > > > But agreed, probably adding a READ_ONCE()/WRITE_ONCE() is appropriate at
> > > > least for the proximate thing.
> > > >
> > > > It's a wonder these things don't trigger more, except you need probably
> > > > very precise timing to do it...
> > >
> > > They do trigger, but we don't send all of them to LKML.
> > > When we first introduced KCSAN, the notion of "data race" was still
> > > poorly understood. At the time we decided to pre-review a number of
> > > them (but our time to do so has been going down :-/), or let willing
> > > maintainers deal with them directly. A number of articles followed,
> >
> > We very much appreciate your efforts :)
> >
> > We are definitely willing to see these in mm, and as you can see from the
> > discussion here, the interaction between the rmap locks and other locks is
> > complicated (see also the docs I wrote on them at [0]).
> 
> Tangentially, I've been trying to work out how to bring this [1] Clang
> feature to the kernel: it's more or less a simple "capability system"
> [2] to express "acquire this before doing that / don't hold this thing
> here / etc.". Locking rules are an obvious application. It's been on a
> number of people's radar over the years, but nothing materialized.
> Sparse's locking analysis is much weaker, nor easy (i.e. quick) to
> use.
> 
> [1] https://clang.llvm.org/docs/ThreadSafetyAnalysis.html
> [2] https://www.cs.cornell.edu/talc/papers/capabilities.pdf
> 
> The current work-in-progress is here:
> https://git.kernel.org/pub/scm/linux/kernel/git/melver/linux.git/log/?h=cap-analysis
> It lacks documentation, and proper commit messages, but is otherwise
> usable (see example enablements for kfence, kcov, and stackdepot and
> lib/test_capability-analysis.c).
> An official RFC will follow, but the hard part of writing
> documentation is in the works. ;-)
> 
> There are also other questions, such as:  can a subset of the analysis
> be applied tree-wide (vs. current selective enablement), as it would
> help find more bugs faster.

You will get so many false positives from my code in the maple tree
alone that it will not be very useful, due to rcu usage.

My main issue in the maple tree is that I have a pointer that guards the
other data as valid, so I read data and then check this guard pointer
(the parent pointer in the node) to ensure what I've seen is indeed
valid.

> However, the reality of it is that using this system would be opting
> into a "dialect of C with capability analysis" with its own set of
> restrictions, and I don't know if everyone is willing to pay this
> cost.
> 
> What I'd be curious about is, if some of the complex rules you mention
> above can be expressed so that Clang's "capability analysis" can point
> out some bugs. I suspect not everything can be expressed, but even if
> we get 50% there, we could catch a huge amount of bugs statically at
> compile-time.

The locking of the vma is difficult because we have the per-vma lock,
the mmap read/write lock, the rmap lock, and the rcu read lock.  The
vmas are also transitioning to be rcu type-safe.

You can also find the vma through the mm's vma tree, and the rmap.
Also, other tasks can look up vmas by finding them through the rmap and
gup, maybe others.

The path taken to find the vma will dictate which locks ensure it to be
safe (for a lower cost than taking ALL of the locks or THE SAME lock).
And like this path, many of the races are benign by design. I'm not sure
how you could verify this with a compiler - without annotation, as Jann
is suggesting here.

> 
> I let this cat out the bag, because this thread seems like a good way
> to get super-early high-level feedback. :-)
> It'll be a while before the first RFC.

My initial thought is that you don't want to start with vmas, but if you
do start with vmas and can make it work, then most other places will be
easier.

Thanks,
Liam