> On Sep 18, 2020, at 4:53 PM, Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote: > > On Fri, Sep 18, 2020 at 08:09:04AM -0700, Andy Lutomirski wrote: >>> On Tue, Sep 15, 2020 at 4:28 AM Jarkko Sakkinen >>> <jarkko.sakkinen@xxxxxxxxxxxxxxx> wrote: >>> >>> From: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> >>> >>> Add vm_ops()->mprotect() for additional constraints for a VMA. >>> >>> Intel Software Guard eXtensions (SGX) will use this callback to add two >>> constraints: >>> >>> 1. Verify that the address range does not have holes: each page address >>> must be filled with an enclave page. >>> 2. Verify that VMA permissions won't surpass the permissions of any enclave >>> page within the address range. Enclave cryptographically sealed >>> permissions for each page address that set the upper limit for possible >>> VMA permissions. Not respecting this can cause #GP's to be emitted. > > Side note, #GP is wrong. EPCM violations are #PFs. Skylake CPUs #GP, but > that's technically an errata. But this isn't the real motivation, e.g. > userspace can already trigger #GP/#PF by reading/writing a bad address, SGX > simply adds another flavor. > >> It's been awhile since I looked at this. Can you remind us: is this >> just preventing userspace from shooting itself in the foot or is this >> something more important? > > Something more important, it's used to prevent userspace from circumventing > a noexec filesystem by loading code into an enclave, and to give the kernel the > option of adding enclave specific LSM policies in the future. > > The source file (if one exists) for the enclave is long gone when the enclave > is actually mmap()'d and mprotect()'d. To enforce noexec, the requested > permissions for a given page are snapshotted when the page is added to the > enclave, i.e. when the enclave is built. Enclave pages that will be executable > must originate from an a MAYEXEC VMA, e.g. the source page can't come from a > noexec file system. > > The ->mprotect() hook allows SGX to reject mprotect() if userspace is declaring > permissions beyond what are allowed, e.g. trying to map an enclave page with > EXEC permissions when the page was added to the enclave without EXEC. > > Future LSM policies have a similar need due to vm_file always pointing at > /dev/sgx/enclave, e.g. policies couldn't be attached to a specific enclave. > ->mprotect() again allows enforcing permissions at map time that were checked > at enclave build time, e.g. via an LSM hook. > > Deferring ->mprotect() until LSM support is added (if it ever is) would be > problematic due to SGX2. With SGX2, userspace can extend permissions of an > enclave page (for the CPU's EPC Map entry, not the kernel's page tables) > without bouncing through the kernel. Without ->mprotect () enforcement. > userspace could do EADD(RW) -> mprotect(RWX) -> EMODPE(X) to gain W+X. We > want to disallow such a flow now, i.e. force userspace to do EADD(RW,X), so > that the hypothetical LSM hook would have all information at EADD(), i.e. > would be aware of the EXEC permission, without creating divergent behavior > based on whether or not an LSM is active. That’s what I thought. Can we get this in the changelog?