On Sat, Sep 07, 2024 at 08:27:52PM GMT, Lorenzo Stoakes wrote: > On Fri, Aug 30, 2024 at 04:57:26PM GMT, Jeff Xu wrote: > > On Fri, Aug 30, 2024 at 12:23 PM Lorenzo Stoakes > > <lorenzo.stoakes@xxxxxxxxxx> wrote: > > > > > > On Fri, Aug 30, 2024 at 07:43:12PM GMT, Lorenzo Stoakes wrote: > > > > On Fri, Aug 30, 2024 at 06:02:36PM GMT, jeffxu@xxxxxxxxxxxx wrote: > > > > > From: Jeff Xu <jeffxu@xxxxxxxxxxxx> > > > > > > > > > > Add sealing test to cover mmap for > > > > > Expand/shrink across sealed vmas (MAP_FIXED) > > > > > Reuse the same address in !MAP_FIXED case. > > Hi Jeff, I really want to find a constructive way forward, but you've > basically ignored more or less everything I've said here. > > I could respond again to each of your points here, but - from my point of > view - if your response to 'what is this even testing?' is to just repeat > in effect the name of the test - we will be stuck in a loop, which will be > exited with a NACK. I don't want this. > > The majority of these tests, from a VMA/mmap point of view, appear to me to > be essentially testing 'does basic mmap functionality work correctly', > which isn't testing mseal. > > Look - I appreciate your commitment to testing (see my work on mm/vma.c - I > care passionately about testing) - but we must make sure we are actually > testing what we mean to. > > So I suggest as a constructive way forward - firstly, submit a regression > test for the change Liam wrapped into his series regarding the -EPERM > thing. > > This should go in uncontroversially, I will take the time to review it and > I don't see why that shouldn't be relatively straight forward. I will drop > the concerns about things like test macros etc. for that. > > Then after that, I suggest we have a discussion about - at a higher level - > what it is you want to test. And then between me, you, Liam and Pedro - > ahead of time, list out the tests that we want, with all of us reaching > consensus. Hi, I agree with most of the points. Sitting down here to write unofficial guidelines for mseal behavior. mseal should seal regions and mark them immutable, which means their protection and contents (ish?) (not _only_ backing mapping, but also contents in general (see madvise below)) should not be changed throughout the lifetime of the address space. For the general syscall interface, this means: 1) mprotect and munmap need to be blocked on mseal regions. 1a) munmap _cannot_ tolerate partial failure, per POSIX. 2b) mmap MAP_FIXED counts as an unmap operation and also needs to be blocked and return -EPERM. 2) Most madvise calls are allowed, except for destructive operations on read-only anonymous _pages_ (MADV_DONTNEED is destructive for anon, but we also don't care about blocking these ops if we can do it manually with e.g memset) 2a) The current implementation only blocks discard on anonymous _regions_, which is slightly different. We probably do want to block these on MAP_PRIVATE file mappings, as to block stuff like madvise MADV_DONTNEED on program rodata. 2b) We take into account pkeys when doing the permission checks. 3) mremap is not allowed as we'd change the "contents" of the old region. 3a) Should mremap expansion be allowed? aka only block moving and shrinking, but allow expansion. We already informally allow expansion if e.g mmapping after it + mseal. 4) mlock and msync are allowed. 5) mseal is blocked. 6) Other miscellaneous syscalls (mbind, etc) that do not change contents in any way, are allowed. 6a) This obviously means PTEs can change as long as the contents don't. Swapping is also ok. 7) FOLL_FORCE (kernel-internal speak, more commonly seen as ptrace and /proc/self/mem from userspace) should be disallowed (?) 7a) This currently does not happen, and seems like a large hole? But disallowing this would probably severely break ptrace if the ELF sealing plans come to fruition. When we say "disallowed", we usually (apart from munmap) allow for partial failure. This means getting an -EPERM while part of the call succeeded, if we e.g mprotect a region consisting of [NORMAL VMA][SEALED VMA]. We do not want to test for this, because we do not want to paint ourselves into a corner - this is strictly "undefined behavior". The msealed regions themselves will never be touched in such cases. (we do however want to test munmap operation atomicity, but this is also kind of a munmap-related test, and might not really be something we really want in the mseal tests) Kernel-internal wise: The VMA and PTE modifications resulting from the above operations are blocked. Sealed VMAs allow splitting and merging; there was contention about the splitting issue, but it truly does not make sense to block operations unless they affect a VMA entirely, and that would also force VMA merging to be ABI ("vma_merge isn't merging these two regions and now my madvise works/doesn't work :("). Do I have everything right? Am I missing anything? -- Pedro