On Sat, 2022-06-11 at 04:12 +0300, Kirill A. Shutemov wrote: > On Fri, Jun 10, 2022 at 10:18:23PM +0000, Edgecombe, Rick P wrote: > > On Fri, 2022-06-10 at 11:08 -0700, Edgecombe, Richard P wrote: > > > On Fri, 2022-06-10 at 21:06 +0300, Kirill A. Shutemov wrote: > > > > On Fri, Jun 10, 2022 at 04:16:01PM +0000, Edgecombe, Rick P > > > > wrote: > > > > > On Fri, 2022-06-10 at 17:35 +0300, Kirill A. Shutemov wrote: > > > > > > +static int prctl_enable_tagged_addr(unsigned long nr_bits) > > > > > > +{ > > > > > > + struct mm_struct *mm = current->mm; > > > > > > + > > > > > > + /* Already enabled? */ > > > > > > + if (mm->context.lam_cr3_mask) > > > > > > + return -EBUSY; > > > > > > + > > > > > > + /* LAM has to be enabled before spawning threads */ > > > > > > + if (get_nr_threads(current) > 1) > > > > > > + return -EBUSY; > > > > > > > > > > Does this work for vfork()? I guess the idea is that locking > > > > > is > > > > > not > > > > > needed below because there is only one thread with the MM, > > > > > but > > > > > with > > > > > vfork() another task could operate on the MM, call fork(), > > > > > etc. > > > > > I'm > > > > > not > > > > > sure... > > > > > > > > I'm not sure I follow. vfork() blocks parent process until > > > > child > > > > exit > > > > or > > > > execve(). I don't see how it is a problem. > > > > > > Oh yea, you're right. > > > > Actually, I guess vfork() only suspends the calling thread. So what > > if > > you had: > > 1. Parent spawns a bunch of threads > > 2. vforks() > > 3. Child enables LAM (it only has one thread, so succeeds) > > 4. Child exits() > > 5. Parent has some threads with LAM, and some not > > I think it is in "Don't do that" territory. It is very similar to > cases > described in "Caveats" section of the vfork(2) man-page. > > > It's some weird userspace that doesn't deserve to have things work > > for > > it, but I wonder if it could open up little races around untagging. > > As > > an example, KVM might have a super narrow race where it checks for > > tags > > in memslots using addr != untagged_addr(addr) before checking > > access_ok(addr, ...). See __kvm_set_memory_region(). If mm- > > > context.untag_mask got set in the middle, tagged memslots could > > > be > > > > added. > > Ultimately, a process which calls vfork(2) is in control of what > happens > to the new process until execve(2) or exit(2). So, yes it is very > creative > way to shoot yourself into leg, but I don't think it worth > preventing. > > And I'm not sure how the fix would look like. Yea, userspace shooting itself in the foot is fine. You would really have to go out of your way to do that. But my concern is that it will expose the kernel. The KVM scenario I outlined is a narrow race, but it lets guests write to freed pages. So the "not first thread enabling" seems like a generally fragile thing. I don't know how to fix it, but I think enabling LAM seems fraught and should be contained strictly to MMs with one thread. I'm not sure, but what about using in_vfork()?