On Fri, Jul 2, 2021 at 8:25 AM Andrei Vagin <avagin@xxxxxxxxx> wrote: > On Mon, Jun 28, 2021 at 06:13:29PM +0200, Jann Horn wrote: > > On Wed, Apr 14, 2021 at 7:59 AM Andrei Vagin <avagin@xxxxxxxxx> wrote: > > > +static void swap_mm(struct mm_struct *prev_mm, struct mm_struct *target_mm) > > > +{ > > > + struct task_struct *tsk = current; > > > + struct mm_struct *active_mm; > > > + > > > + task_lock(tsk); > > > + /* Hold off tlb flush IPIs while switching mm's */ > > > + local_irq_disable(); > > > + > > > + sync_mm_rss(prev_mm); > > > + > > > + vmacache_flush(tsk); > > > + > > > + active_mm = tsk->active_mm; > > > + if (active_mm != target_mm) { > > > + mmgrab(target_mm); > > > + tsk->active_mm = target_mm; > > > + } > > > + tsk->mm = target_mm; > > > > I'm pretty sure you're not currently allowed to overwrite the ->mm > > pointer of a userspace thread. For example, zap_threads() assumes that > > all threads running under a process have the same ->mm. (And if you're > > fiddling with ->mm stuff, you should probably CC linux-mm@.) > > > > As far as I understand, only kthreads are allowed to do this (as > > implemented in kthread_use_mm()). > > kthread_use_mm() was renamed from use_mm in the v5.8 kernel. Before > that, it wasn't used for user processes in the kernel, but it was > exported for modules, and we used it without any visible problems. We > understood that there could be some issues like zap_threads and it was > one of reasons why we decided to introduce this system call. > > I understand that there are no places in the kernel where we change mm > of user threads back and forth, but are there any real concerns why we > should not do that? I agree that zap_threads should be fixed, but it > will the easy one. My point is that if you break a preexisting assumption like this, you'll have to go through the kernel and search for places that rely on this assumption, and fix them up, which may potentially require thinking about what kinds of semantics would actually be appropriate there. Like the MCE killing logic (collect_procs_anon() and such). And current_is_single_threaded(), in which the current patch probably leads to logic security bugs. And __uprobe_perf_filter(). Before my refactoring of the ELF coredump logic in kernel 5.10 (commit b2767d97f5ff75 and the ones before it), you'd have also probably created memory corruption bugs in races between elf_core_dump() and syscalls like mmap()/munmap(). (Note that this is not necessarily an exhaustive list.)