On Wed, Apr 20, 2022 at 02:01:02PM +0100, Catalin Marinas wrote: > I agree we should look at what we want to cover, though trying to avoid > re-inventing SELinux. With this patchset I went for the minimum that > systemd MDWE does with BPF. Right -- and I don't think we're at any risk of slippery-sloping into a full MAC system. :) I'm fine with doing the implementation in stages, if we've attempted to design the steps (which I think you've got a good start on below). > I think JITs get around it using something like memfd with two separate > mappings to the same page. We could try to prevent such aliases but > allow it if an ELF note is detected (or get the JIT to issue a prctl()). Right -- I'd rather JITs carry some hard-coded property (i.e. ELF note) to indicate the fact that they're expecting to do these kinds of things rather than leaving it open for all processes. > Anyway, with a prctl() we can allow finer-grained control starting with > anonymous and file mappings and later extending to vma aliases, > writeable files etc. On top we can add a seal mask so that a process > cannot disable a control was set. Something like (I'm not good at > names): > > prctl(PR_MDWX_SET, flags, seal_mask); > prctl(PR_MDWX_GET); > > with flags like: > > PR_MDWX_MMAP - basics, should cover mmap() and mprotect() > PR_MDWX_ALIAS - vma aliases, allowed with an ELF note > PR_MDWX_WRITEABLE_FILE The SARA proposal lists a lot of behavioral details to consider. Quoting it[1] here: >> - W^X enforcement will cause problems to any programs that needs >> memory pages mapped both as writable and executable at the same time e.g. >> programs with executable stack markings in the PT_GNU_STACK segment. IMO, executable stack markings should be considered completely deprecated. In fact, we've been warning about it since 2020: 47a2ebb7f505 ("execve: warn if process starts with executable stack") So with execstack, under W^X, I think we should either: - refuse to exec the process (default) - disable W^X for the process (but not its children) >> - W!->X restriction will cause problems to any program that >> needs to generate executable code at run time or to modify executable >> pages e.g. programs with a JIT compiler built-in or linked against a >> non-PIC library. This seems solvable with an ELF flag. >> - Executable MMAP prevention can work only with programs that have at least >> partial RELRO support. It's disabled automatically for programs that >> lack this feature. It will cause problems to any program that uses dlopen >> or tries to do an executable mmap. Unfortunately this feature is the one >> that could create most problems and should be enabled only after careful >> evaluation. This seems like a variation on the execstack case, and we should be able to detect the state and choose a behavior based on system settings, and a smarter version (as SARA has) would track RELRO pages waiting for the loader to make them read-only. SARA was proposed with a set of feature flags[2]; quoting here: >> | W^X | 0x0008 | This is the basic property, refusing PROT_WRITE | PROT_EXEC. I note that SARA also rejects opening /proc/$pid/mem with FMODE_WRITE when this is enabled for a process. (It likely should extend to process_vm_write() too.) >> | W!->X Heap | 0x0001 | >> | W!->X Stack | 0x0002 | >> | W!->X Other memory | 0x0004 | This is for the vma history tracking, and I don't think we need to separate this by memory type? It's nice to have the granularity, but for a first-pass it seems like overkill? Maybe I'm missing some detail. >> | Don't enforce, just complain | 0x0010 | >> | Be Verbose | 0x0020 | Unclear if these would work well with a non-LSM approach. >> | Executable MMAP prevention | 0x0040 | This is the relro detection piece. >> | Trampoline emulation | 0x0100 | This is a more advanced case for emulating execstack, but if we can just ignore execstack entirely, this can go away? >> | Children will inherit flags | 0x0200 | Should a process have that control? >> | Force W^X on setprocattr | 0x0080 | This is a "seal" trigger, which could be done through prctl(). It looks like a bunch of the features are designed around having as much as possible enabled at exec time, and then tightening it further as various things are finished (e.g. execstack, relro, sealing, etc), which is, I think, what would still be needed for a process launcher to be able to enable this kind of protection. (i.e. hoping the process calls a prctl() to enable the protection isn't going to work well with systemd.) So, I *think* we could have a minimal form with these considerations: - execstack: declare it distinctly incompatible. - relro: I think this is solved with BIND_NOW. It's been a while since I looked deeply at this, but I think under BIND_NOW, the (executable) PLT doesn't ever need to be writable (since it points into the GOT), and the (initially writable) GOT is already never executable. This needs to be verified... - JITs can be allowed with a ELF flag and can choose to opt-in with a prctl(). -Kees [1] https://lore.kernel.org/lkml/1562410493-8661-1-git-send-email-s.mesoraca16@xxxxxxxxx/ [2] https://lore.kernel.org/lkml/1562410493-8661-2-git-send-email-s.mesoraca16@xxxxxxxxx/ -- Kees Cook