Re: [PATCH v5 4/8] uprobes: travers uprobe's consumer list locklessly under SRCU protection

"Paul E. McKenney" <paulmck@xxxxxxxxxx> · Thu, 7 Nov 2024 08:13:43 -0800

oN tHU, nov 07, 2024 at 08:01:05AM -0800, Andrii Nakryiko wrote:
> On Thu, Nov 7, 2024 at 3:35 AM Breno Leitao <leitao@xxxxxxxxxx> wrote:
> >
> > Hello Andrii,
> >
> > On Wed, Nov 06, 2024 at 08:25:25AM -0800, Andrii Nakryiko wrote:
> > > On Wed, Nov 6, 2024 at 4:03 AM Breno Leitao <leitao@xxxxxxxxxx> wrote:
> > > > On Tue, Sep 03, 2024 at 10:45:59AM -0700, Andrii Nakryiko wrote:
> > > > > uprobe->register_rwsem is one of a few big bottlenecks to scalability of
> > > > > uprobes, so we need to get rid of it to improve uprobe performance and
> > > > > multi-CPU scalability.
> > > > >
> > > > > First, we turn uprobe's consumer list to a typical doubly-linked list
> > > > > and utilize existing RCU-aware helpers for traversing such lists, as
> > > > > well as adding and removing elements from it.
> > > > >
> > > > > For entry uprobes we already have SRCU protection active since before
> > > > > uprobe lookup. For uretprobe we keep refcount, guaranteeing that uprobe
> > > > > won't go away from under us, but we add SRCU protection around consumer
> > > > > list traversal.
> > > >
> > > > I am seeing the following message in a kernel with RCU_PROVE_LOCKING:
> > > >
> > > >         kernel/events/uprobes.c:937 RCU-list traversed without holding the required lock!!
> > > >
> > > > It seems the SRCU is not held, when coming from mmap_region ->
> > > > uprobe_mmap. Here is the message I got in my debug kernel. (sorry for
> > > > not decoding it, but, the stack trace is clear enough).
> > > >
> > > >          WARNING: suspicious RCU usage
> > > >            6.12.0-rc5-kbuilder-01152-gc688a96c432e #26 Tainted: G        W   E    N
> > > >            -----------------------------
> > > >            kernel/events/uprobes.c:938 RCU-list traversed without holding the required lock!!
> > > >
> > > > other info that might help us debug this:
> > > >
> > > > rcu_scheduler_active = 2, debug_locks = 1
> > > >            3 locks held by env/441330:
> > > >             #0: ffff00021c1bc508 (&mm->mmap_lock){++++}-{3:3}, at: vm_mmap_pgoff+0x84/0x1d0
> > > >             #1: ffff800089f3ab48 (&uprobes_mmap_mutex[i]){+.+.}-{3:3}, at: uprobe_mmap+0x20c/0x548
> > > >             #2: ffff0004e564c528 (&uprobe->consumer_rwsem){++++}-{3:3}, at: filter_chain+0x30/0xe8
> > > >
> > > > stack backtrace:
> > > >            CPU: 4 UID: 34133 PID: 441330 Comm: env Kdump: loaded Tainted: G        W   E    N 6.12.0-rc5-kbuilder-01152-gc688a96c432e #26
> > > >            Tainted: [W]=WARN, [E]=UNSIGNED_MODULE, [N]=TEST
> > > >            Hardware name: Quanta S7GM 20S7GCU0010/S7G MB (CG1), BIOS 3D22 07/03/2024
> > > >            Call trace:
> > > >             dump_backtrace+0x10c/0x198
> > > >             show_stack+0x24/0x38
> > > >             __dump_stack+0x28/0x38
> > > >             dump_stack_lvl+0x74/0xa8
> > > >             dump_stack+0x18/0x28
> > > >             lockdep_rcu_suspicious+0x178/0x2c8
> > > >             filter_chain+0xdc/0xe8
> > > >             uprobe_mmap+0x2e0/0x548
> > > >             mmap_region+0x510/0x988
> > > >             do_mmap+0x444/0x528
> > > >             vm_mmap_pgoff+0xf8/0x1d0
> > > >             ksys_mmap_pgoff+0x184/0x2d8
> > > >
> > > >
> > > > That said, it seems we want to hold the SRCU, before reaching the
> > > > filter_chain(). I hacked a bit, and adding the lock in uprobe_mmap()
> > > > solves the problem, but, I might be missing something, since I am not familiar
> > > > with this code.
> > > >
> > > > How does the following patch look like?
> > > >
> > > > commit 1bd7bcf03031ceca86fdddd8be2e5500497db29f
> > > > Author: Breno Leitao <leitao@xxxxxxxxxx>
> > > > Date:   Mon Nov 4 06:53:31 2024 -0800
> > > >
> > > >     uprobes: Get SRCU lock before traverseing the list
> > > >
> > > >     list_for_each_entry_srcu() is being called without holding the lock,
> > > >     which causes LOCKDEP (when enabled with RCU_PROVING) to complain such
> > > >     as:
> > > >
> > > >             kernel/events/uprobes.c:937 RCU-list traversed without holding the required lock!!
> > > >
> > > >     Get the SRCU uprobes_srcu lock before calling filter_chain(), which
> > > >     needs to have the SRCU lock hold, since it is going to call
> > > >     list_for_each_entry_srcu().
> > > >
> > > >     Signed-off-by: Breno Leitao <leitao@xxxxxxxxxx>
> > > >     Fixes: cc01bd044e6a ("uprobes: travers uprobe's consumer list locklessly under SRCU protection")
> > > >
> > > > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> > > > index 4b52cb2ae6d62..cc9d4ddeea9a6 100644
> > > > --- a/kernel/events/uprobes.c
> > > > +++ b/kernel/events/uprobes.c
> > > > @@ -1391,6 +1391,7 @@ int uprobe_mmap(struct vm_area_struct *vma)
> > > >         struct list_head tmp_list;
> > > >         struct uprobe *uprobe, *u;
> > > >         struct inode *inode;
> > > > +       int srcu_idx;
> > > >
> > > >         if (no_uprobe_events())
> > > >                 return 0;
> > > > @@ -1409,6 +1410,7 @@ int uprobe_mmap(struct vm_area_struct *vma)
> > > >
> > > >         mutex_lock(uprobes_mmap_hash(inode));
> > > >         build_probe_list(inode, vma, vma->vm_start, vma->vm_end, &tmp_list);
> > > > +       srcu_idx = srcu_read_lock(&uprobes_srcu);
> > >
> > > Thanks for catching that (production testing FTW, right?!).
> >
> > Correct. I am running some hosts with RCU_PROVING and I am finding some
> > cases where RCU protected areas are touched without holding the RCU read
> > lock.
> >
> > > But I think you a) adding wrong RCU protection flavor (it has to be
> > > rcu_read_lock_trace()/rcu_read_unlock_trace(), see uprobe_apply() for
> > > an example) and b) I think this is the wrong place to add it. We
> > > should add it inside filter_chain(). filter_chain() is called from
> > > three places, only one of which is already RCU protected (that's the
> > > handler_chain() case). But there is also register_for_each_vma(),
> > > which needs RCU protection as well.
> >
> > Thanks for the guidance!
> >
> > My initial plan was to protect filter_chain(), but, handler_chain()
> > already has the lock. Is it OK to get into a critical section in a
> > nested form?
> >
> > The code will be something like:
> >
> > handle_swbp() {
> >         rcu_read_lock_trace();
> >         handler_chain() {
> >                 filter_chain() {
> >                         rcu_read_lock_trace();
> >                         list_for_each_entry_rcu()
> >                         rcu_read_lock_trace();
> >                 }
> >         }
> >         rcu_read_lock_trace();
> > }
> >
> > Is this nested locking fine?
> 
> Yes, it's totally fine to nest RCU lock regions.

As long as you don't nest them more than 255 deep in CONFIG_PREEMPT=n
kernels that also have CONFIG_PREEMPT_COUNT=y, or more than 2G deep in
CONFIG_PREEMPT=y kernels.  For a limited time only, in CONFIG_PREEMPT=n
kernels that also have CONFIG_PREEMPT_COUNT=n, you can nest as deeply
as you want.  ;-)

Sorry, couldn't resist...

							Thanx, Paul