On Thu, Dec 14, 2023 at 12:14 PM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Thu, 14 Dec 2023 at 10:07, Stephen Röttger <sroettger@xxxxxxxxxx> wrote: > > > > AIUI, the madvise(DONTNEED) should effectively only change the content of > > anonymous pages, i.e. it's similar to a memset(0) in that case. That's why we > > added this special case: if you want to madvise(DONTNEED) an anonymous page, > > you should have write permissions to the page. > > Hmm. I actually would be happier if we just made that change in > general. Maybe even without sealing, but I agree that it *definitely* > makes sense in general as a sealing thing. > > IOW, just saying > > "madvise(DONTNEED) needs write permissions to an anonymous mapping when sealed" > > makes 100% sense to me. Having a separate _flag_ to give sensible > semantics is just odd. > > IOW, what I really want is exactly that "sensible semantics, not random flags". > > Particularly for new system calls with fairly specialized use, I think > it's very important that the semantics are sensible on a conceptual > level, and that we do not add system calls that are based on "random > implementation issue of the day". > > Yes, yes, then as we have to maintain things long-term, and we hit > some compatibility issue, at *THAT* point we'll end up facing nasty > "we had an implementation that had these semantics in practice, so now > we're stuck with it", but when introducing a new system call, let's > try really hard to start off from those kinds of random things. > > Wouldn't it be lovely if we can just come up with a sane set of "this > is what it means to seal a vma", and enumerate those, and make those > sane conceptual rules be the initial definition. By all means have a > "flags" argument for future cases when we figure out there was > something wrong or the notion needed to be extended, but if we already > *start* with random extensions, I feel there's something wrong with > the whole concept. > > So I would really wish for the first version of > > mseal(start, len, flags); > > to have "flags=0" be the one and only case we actually handle > initially, and only add a single PROT_SEAL flag to mmap() that says > "create this mapping already pre-sealed". > > Strive very hard to make sealing be a single VM_SEALED flag in the > vma->vm_flags that we already have, just admit that none of this > matters on 32-bit architectures, so that VM_SEALED can just use one of > the high flags that we have several free of (and that pkeys already > depends on), and make this a standard feature with no #ifdef's. > > Can chrome live with that? And what would the required semantics be? > I'll start the list: > > - you can't unmap or remap in any way (including over-mapping) > > - you can't change protections (but with architecture support like > pkey, you can obviously change the protections indirectly with PKRU > etc) > > - you can't do VM operations that change data without the area being > writable (so the DONTNEED case - maybe there are others) > > - anything else? > > Wouldn't it be lovely to have just a single notion of sealing that is > well-documented and makes sense, and doesn't require people to worry > about odd special cases? > > And yes, we'd have the 'flags' argument for future special cases, and > hope really hard that it's never needed. > Yes, those inputs make a lot of sense ! I will start the next version. In the meantime, if there are more comments, please continue to post, I appreciate those to make the API better and simple to use. -Jeff > Linus >