On Mon, Dec 09, 2024 at 05:09:13PM +0000, Lorenzo Stoakes wrote: > On Mon, Dec 09, 2024 at 11:12:52AM -0500, Liam R. Howlett wrote: > > +Cc maintainers listed of kernel/events/uprobe.c > > > > TL;DR: > > dup_mmap() fails, but uprobe thinks it's fine and keeps trying to use an > > incomplete mm_struct. > > > > We're looking for a way to signal to uprobe to abort, cleanly. > > > > Looking at kernel/fork.c, dup_mmap(): > > > > fail_uprobe_end: > > uprobe_end_dup_mmap(); > > return retval; > > > > So uprobe is aware it could fail, but releases the semaphore and then > > doesn't check if the mm struct is okay to use. > > > > What should happen in the failed mm_struct case? > > > > Thanks, > > Liam > > > > (As discussed on IRC) how about moving up the dup_mmap_sem lock one level, we > can put the mm before the rmap lookup in build_map_info() is able to find it, > which should avoid the whole issue? > > Untested patch attached. Urgh, bit of a maze this. So BPF does a uprobe_register(), which walks rmap and finds an incomplete mm. uprobe_{start,end}_dup_mmap() serialize uprobe_register(), but are too narrow to cover the whole fail case. So *bang* happens. The below moves this serialization up to cover the whole of dup_mmap(), such that register can no longer find partial mm's in the rmap. Which will cure problem, but it does leave me uncomfortable vs other rmap users. Also, you need to fix fail_uprobe_end label, that's left dangling as is. > ----8<---- > From 629b04fe8cfdf6b4fad0ff91a316d4643ccd737d Mon Sep 17 00:00:00 2001 > From: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> > Date: Mon, 9 Dec 2024 16:58:14 +0000 > Subject: [PATCH] tmp > > --- > kernel/fork.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/kernel/fork.c b/kernel/fork.c > index 1450b461d196..4d62e776c413 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -639,7 +639,6 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, > LIST_HEAD(uf); > VMA_ITERATOR(vmi, mm, 0); > > - uprobe_start_dup_mmap(); > if (mmap_write_lock_killable(oldmm)) { > retval = -EINTR; > goto fail_uprobe_end; > @@ -783,7 +782,6 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, > else > dup_userfaultfd_fail(&uf); > fail_uprobe_end: > - uprobe_end_dup_mmap(); > return retval; > > fail_nomem_anon_vma_fork: > @@ -1692,9 +1690,11 @@ static struct mm_struct *dup_mm(struct task_struct *tsk, > if (!mm_init(mm, tsk, mm->user_ns)) > goto fail_nomem; > > + uprobe_start_dup_mmap(); > err = dup_mmap(mm, oldmm); > if (err) > goto free_pt; > + uprobe_end_dup_mmap(); > > mm->hiwater_rss = get_mm_rss(mm); > mm->hiwater_vm = mm->total_vm; > @@ -1709,6 +1709,7 @@ static struct mm_struct *dup_mm(struct task_struct *tsk, > mm->binfmt = NULL; > mm_init_owner(mm, NULL); > mmput(mm); > + uprobe_end_dup_mmap(); > > fail_nomem: > return NULL; > -- > 2.47.1