On Thu, Apr 21, 2022 at 06:24:21PM +0100, Catalin Marinas wrote: > On Thu, Apr 21, 2022 at 09:42:23AM -0700, Kees Cook wrote: > > On Thu, Apr 21, 2022 at 04:35:15PM +0100, Catalin Marinas wrote: > > > Do we want the "was PROT_WRITE" or we just reject mprotect(PROT_EXEC) if > > > the vma is not already PROT_EXEC? The latter is closer to the current > > > systemd approach. The former allows an mprotect(PROT_EXEC) if the > > > mapping was PROT_READ only for example. > > > > > > I'd drop the "was PROT_WRITE" for now if the aim is a drop-in > > > replacement for BPF MDWE. > > > > I think "was PROT_WRITE" is an important part of the defense that > > couldn't be done with a simple seccomp filter (which is why the filter > > ended up being a problem in the first place). > > I would say "was PROT_WRITE" is slightly more relaxed than "is not > already PROT_EXEC". The seccomp filter can't do "is not already > PROT_EXEC" either since it only checks the mprotect() arguments, not the > current vma flags. > > So we have (with sub-cases): > > 1. Current BPF filter: > > a) mmap(PROT_READ|PROT_WRITE|PROT_EXEC); // fails > > b) mmap(PROT_READ|PROT_EXEC); > mprotect(PROT_READ|PROT_EXEC|PROT_BTI); // fails > > c) mmap(PROT_READ); > mprotect(PROT_READ|PROT_EXEC); // fails > > d) mmap(PROT_READ|PROT_WRITE); > mprotect(PROT_READ); > mprotect(PROT_READ|PROT_EXEC); // fails > > 2. "is not already PROT_EXEC": > > a) mmap(PROT_READ|PROT_WRITE|PROT_EXEC); // fails > > b) mmap(PROT_READ|PROT_EXEC); > mprotect(PROT_READ|PROT_EXEC|PROT_BTI); // passes > > c) mmap(PROT_READ); > mprotect(PROT_READ|PROT_EXEC); // fails > > d) mmap(PROT_READ|PROT_WRITE); > mprotect(PROT_READ); > mprotect(PROT_READ|PROT_EXEC); // fails > > 3. "is or was not PROT_WRITE": > > a) mmap(PROT_READ|PROT_WRITE|PROT_EXEC); // fails > > b) mmap(PROT_READ|PROT_EXEC); > mprotect(PROT_READ|PROT_EXEC|PROT_BTI); // passes > > c) mmap(PROT_READ); > mprotect(PROT_READ|PROT_EXEC); // passes > > d) mmap(PROT_READ|PROT_WRITE); > mprotect(PROT_READ); > mprotect(PROT_READ|PROT_EXEC); // fails [edited above to show each case] restated what was already summarized: Problem is 1.b. 2 and 3 solve it. 3 is more relaxed (c passes). > If we don't care about 3.c, we might as well go for (2). I don't mind, > already went for (3) in this series. I think either of them would not be > a regression on MDWE, unless there is some test that attempts 3.c and > expects it to fail. I should stop arguing for a less restrictive mode. ;) It just feels weird that the combinations are API-mediated, rather than logically defined: I can do PROT_READ|PROT_EXEC with mmap but not mprotect under 2. As opposed to saying "the vma cannot be executable if it is or ever was writable". I find the latter much easier to reason about as far as the expectations of system state. So, I'd still prefer 3, as that was the _goal_ of the systemd MDWE seccomp filter, but yes, 2 does provide the same protection while allowing BTI. -- Kees Cook