On Thu, Apr 21, 2022 at 04:35:15PM +0100, Catalin Marinas wrote: > On Wed, Apr 20, 2022 at 04:21:45PM -0700, Kees Cook wrote: > > On Wed, Apr 20, 2022 at 10:34:33PM +0300, Topi Miettinen wrote: > > > For systemd, feature compatibility with the BPF version is important so that > > > we could automatically switch to the kernel version once available without > > > regressions. So I think PR_MDWX_MMAP (or maybe PR_MDWX_COMPAT) should match > > > exactly what MemoryDenyWriteExecute=yes as implemented with BPF has: only > > > forbid mmap(PROT_EXEC|PROT_WRITE) and mprotect(PROT_EXEC). Like BPF, once > > > installed there should be no way to escape and ELF flags should be also > > > ignored. ARM BTI should be allowed though (allow PROT_EXEC|PROT_BTI if the > > > old flags had PROT_EXEC). > > I agree. > > > > Then we could have improved versions (other PR_MDWX_ prctls) with lots more > > > checks. This could be enabled with MemoryDenyWriteExecute=strict or so. > > > > > > Perhaps also more relaxed versions (like SARA) could be interesting (system > > > service running Python with FFI, or perhaps JVM etc), enabled with for > > > example MemoryDenyWriteExecute=trampolines. That way even those programs > > > would get some protection (though there would be a gap in the defences). > > > > Yup, I think we're all on the same page. Catalin, can you respin with a > > prctl for enabling MDWE? I propose just: > > > > prctl(PR_MDWX_SET, flags); > > prctl(PR_MDWX_GET); > > > > PR_MDWX_FLAG_MMAP > > disallows PROT_EXEC on any VMA that is or was PROT_WRITE, > > covering at least: mmap, mprotect, pkey_mprotect, and shmat. > > Do we want the "was PROT_WRITE" or we just reject mprotect(PROT_EXEC) if > the vma is not already PROT_EXEC? The latter is closer to the current > systemd approach. The former allows an mprotect(PROT_EXEC) if the > mapping was PROT_READ only for example. > > I'd drop the "was PROT_WRITE" for now if the aim is a drop-in > replacement for BPF MDWE. I think "was PROT_WRITE" is an important part of the defense that couldn't be done with a simple seccomp filter (which is why the filter ended up being a problem in the first place). -- Kees Cook