On Thu, Nov 10, 2022 at 11:27:14AM +0000, Joey Gouly wrote: > On Fri, Oct 28, 2022 at 11:51:00AM -0700, Kees Cook wrote: > > On Wed, Oct 26, 2022 at 04:04:56PM +0100, Joey Gouly wrote: > > > diff --git a/mm/mmap.c b/mm/mmap.c > > > index 099468aee4d8..42eaf6683216 100644 > > > --- a/mm/mmap.c > > > +++ b/mm/mmap.c > > > @@ -1409,6 +1409,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr, > > > vm_flags |= VM_NORESERVE; > > > } > > > > > > + if (map_deny_write_exec(NULL, vm_flags)) > > > + return -EACCES; > > > + > > > > This seems like the wrong place to do the check -- that the vma argument > > is a hard-coded "NULL" is evidence that something is wrong. Shouldn't > > it live in mmap_region()? What happens with MAP_FIXED, when there is > > an underlying vma? i.e. an MAP_FIXED will, I think, bypass the intended > > check. For example, we had "c" above: > > > > c) mmap(PROT_READ); > > mprotect(PROT_READ|PROT_EXEC); // fails > > > > But this would allow another case: > > > > e) addr = mmap(..., PROT_READ, ...); > > mmap(addr, ..., PROT_READ | PROT_EXEC, MAP_FIXED, ...); // passes > > I can move the check into mmap_region() but it won't fix the MAP_FIXED > example that you showed here. > > mmap_region() calls do_mas_munmap(..) which will unmap overlapping regions. > However the `vma` for the 'old' region is not kept around, and a new vma will > be allocated later on "vma = vm_area_alloc(mm);", and the vm_flags are just set > to what is passed into mmap_region(), so map_deny_write_exec(vma, vm_flags) > will just be as good as passing NULL. > > It's possible to save the vm_flags from the region that is unmapped, but Catalin > suggested it might be better if that is part of a later extension, what do you > think? I thought initially we should keep the behaviour close to what systemd achieves via SECCOMP while only relaxing an mprotect(PROT_EXEC) if the vma is already executable (i.e. check actual permission change not just the PROT_* flags). We could pass the old vm_flags for that region (and maybe drop the vma pointer entirely, just check old and new vm_flags). But this feels like tightening slightly systemd's MDWE approach. If user-space doesn't get confused by this, I'm fine to go with it. Otherwise we can add a new flag later for this behaviour I guess that's more of a question for Topi on whether point tightening point (e) is feasible/desirable. -- Catalin