On Tue, Dec 10, 2024 at 06:23:19PM +0100, Oleg Nesterov wrote: > On 12/10, Oleg Nesterov wrote: > > > > I must have missed something, but... > > > > On 12/10, Lorenzo Stoakes wrote: > > > > > > @@ -1746,9 +1741,11 @@ static struct mm_struct *dup_mm(struct task_struct *tsk, > > > if (!mm_init(mm, tsk, mm->user_ns)) > > > goto fail_nomem; > > > > > > + uprobe_start_dup_mmap(); > > > err = dup_mmap(mm, oldmm); > > > if (err) > > > - goto free_pt; > > > + goto free_pt_end_uprobe; > > > + uprobe_end_dup_mmap(); > > > > > > mm->hiwater_rss = get_mm_rss(mm); > > > mm->hiwater_vm = mm->total_vm; > > > @@ -1758,6 +1755,8 @@ static struct mm_struct *dup_mm(struct task_struct *tsk, > > > > > > return mm; > > > > > > +free_pt_end_uprobe: > > > + uprobe_end_dup_mmap(); > > > > if dup_mmap() fails and "mm" is incomplete, then with this version dup_mmap_sem > > is dropped before __mmput/exit_mmap/etc. How can this help? > > Easy to fix, but what ensures that another mmget(mm) is not possible until > dup_mm() calls mmput() and drops dup_mmap_sem? Sorry, my understanding of mm/ > is very limited... Yeah that's a really good point - it's not impossible, though unlikely. This lives alongside the 'what if the rmap provides access to something else' consideration. I believe Liam's going to look into something more solid on the maple tree side. So this change doesn't purport to entirely shut down all possible routes to this issue arising but rather is a very small and direct way of addressing the proximal case. The thing with this situation is that its borderline impossible to happen in practice, so we should probably be cautious about going out of our way to preclude it if it involves too much chnage. Somehow marking the mm as 'don't touch' would be good, but then we'd have to audit every single corner of the kernel that touches it to be really sure that works, and we don't have any MMF_xxx bits left to play with really. MMF_UNSTABLE does seem suited but could it somehow interact with oomk in some broken way...? Not sure the RoI on that is good. Probably the better thing to do here is to attack the XA_ZERO_ENTRY stuff from the maple tree side - see if we can't audit/rule out inappropriate access to these (perhaps adding filters in the vma_xxx(0 helpers?) which elimniates this particular issue from this side. But I think this patch as-is is relatively low risk to scoop up the most likely version of the nearly-impossible thing :) > > Oleg. >