On Thu, Oct 19, 2023 at 3:47 PM Pedro Falcato <pedro.falcato@xxxxxxxxx> wrote: > > On Thu, Oct 19, 2023 at 6:30 PM Jeff Xu <jeffxu@xxxxxxxxxx> wrote: > > > > Hi Pedro > > > > Some followup on mmap() + mprotect(): > > > > On Wed, Oct 18, 2023 at 11:20 AM Jeff Xu <jeffxu@xxxxxxxxxx> wrote: > > > > > > On Tue, Oct 17, 2023 at 3:35 PM Pedro Falcato <pedro.falcato@xxxxxxxxx> wrote: > > > > > > > > > > > > > > > > I think it's worth pointing out that this suggestion (with PROT_*) > > > > > > could easily integrate with mmap() and as such allow for one-shot > > > > > > mmap() + mseal(). > > > > > > If we consider the common case as 'addr = mmap(...); mseal(addr);', it > > > > > > definitely sounds like a performance win as we halve the number of > > > > > > syscalls for a sealed mapping. And if we trivially look at e.g OpenBSD > > > > > > ld.so code, mmap() + mimmutable() and mprotect() + mimmutable() seem > > > > > > like common patterns. > > > > > > > > > > > Yes. mmap() can support sealing as well, and memory is allocated as > > > > > immutable from begining. > > > > > This is orthogonal to mseal() though. > > > > > > > > I don't see how this can be orthogonal to mseal(). > > > > In the case we opt for adding PROT_ bits, we should more or less only > > > > need to adapt calc_vm_prot_bits(), and the rest should work without > > > > issues. > > > > vma merging won't merge vmas with different prots. The current > > > > interfaces (mmap and mprotect) would work just fine. > > > > In this case, mseal() or mimmutable() would only be needed if you need > > > > to set immutability over a range of VMAs with different permissions. > > > > > > > Agreed. By orthogonal, I meant we can have two APIs: > > > mmap() and mseal()/mprotect() > > > i.e. we can't just rely on mmap() only without mseal()/mprotect()/mimmutable(). > > > Sealing can be applied after initial memory creation. > > > > > > > Note: modifications should look kinda like this: https://godbolt.org/z/Tbjjd14Pe > > > > The only annoying wrench in my plans here is that we have effectively > > > > run out of vm_flags bits in 32-bit architectures, so this approach as > > > > I described is not compatible with 32-bit. > > > > > > > > > In case of ld.so, iiuc, memory can be first allocated as W, then later > > > > > changed to RO, for example, during symbol resolution. > > > > > The important point is that the application can decide what type of > > > > > sealing it wants, and when to apply it. There needs to be an api(), > > > > > that can be mseal() or mprotect2() or mimmutable(), the naming is not > > > > > important to me. > > > > > > > > > > mprotect() in linux have the following signature: > > > > > int mprotect(void addr[.len], size_t len, int prot); > > > > > the prot bitmasks are all taken here. > > > > > I have not checked the prot field in mmap(), there might be bits left, > > > > > even not, we could have mmap2(), so that is not an issue. > > > > > > > > I don't see what you mean. We have plenty of prot bits left (32-bits, > > > > and we seem to have around 8 different bits used). > > > > And even if we didn't, prot is the same in mprotect and mmap and mmap2 :) > > > > > > > > The only issue seems to be that 32-bit ran out of vm_flags, but that > > > > can probably be worked around if need be. > > > > > > > Ah, you are right about this. vm_flags is full, and prot in mprotect() is not. > > > Apology that I was wrong previously and caused confusion. > > > > > > There is a slight difference in the syntax of mprotect and mseal. > > > Each time when mprotect() is called, the kernel takes all of RWX bits > > > and updates vm_flags, > > > In other words, the application sets/unset each RWX, and kernel takes it. > > > > > > In the mseal() case, the kernel will remember which seal types were > > > applied previously, and the application doesn’t need to repeat all > > > existing seal types in the next mseal(). Once a seal type is applied, > > > it can’t be unsealed. > > > > > > So if we want to use mprotect() for sealing, developers need to think > > > of sealing bits differently than the rest of prot bits. It is a > > > different programming model, might or might not be an obvious concept > > > to developers. > > > > > This probably doesn't matter much to developers. > > We can enforce the sealing bit to be the same as the rest of PROT bits. > > If mprotect() tries to unset sealing, it will fail. > > Yep. Erroneous or malicious mprotects would all be caught. However, if > we add a PROT_DOWNGRADEABLE (that could let you, lets say, mprotect() > to less permissions or even downright munmap()) you'd want some care > to preserve that bit when setting permissions. > > > > > > There is a difference in input check and error handling as well. > > > for mseal(), if a given address range has a gap (unallocated memory), > > > or if one of VMA is sealed with MM_SEAL_SEAL flag, none of VMAs is > > > updated. > > > For mprotect(), some VMAs can be updated, till an error happens to a VMA. > > > > > This difference doesn't matter much. > > > > For mprotect()/mmap(), is Linux implementation limited by POSIX ? > > No. POSIX works merely as a baseline that UNIX systems aim towards. > You can (and very frequently do) extend POSIX interfaces (in fact, > it's how most of POSIX was written, through sheer > "design-by-committee" on a bunch of UNIX systems' extensions). > > > This can be made backward compatible. > > If there is no objection to adding linux specific values in mmap() and > > mprotect(), > > This works for me. > > Linux already has system-specific values for PROT_ (PROT_BTI, > PROT_MTE, PROT_GROWSUP, PROT_GROWSDOWN, etc). > Whether this is the right interface is another question. I do like it > a lot, but there's of course value in being compatible with existing > solutions (like mimmutable()). > Thanks Pedro for providing examples on mm extension to POSIX. This opens more design options on solving the sealing problem. I will take a few days to research design options. -Jeff > -- > Pedro