Re: [PATCH v38 10/24] mm: Add vm_ops->mprotect()

Andy Lutomirski <luto@xxxxxxxxxxxxxx> · Fri, 18 Sep 2020 17:15:32 -0700

> On Sep 18, 2020, at 4:53 PM, Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote:
> 
> On Fri, Sep 18, 2020 at 08:09:04AM -0700, Andy Lutomirski wrote:
>>> On Tue, Sep 15, 2020 at 4:28 AM Jarkko Sakkinen
>>> <jarkko.sakkinen@xxxxxxxxxxxxxxx> wrote:
>>> 
>>> From: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
>>> 
>>> Add vm_ops()->mprotect() for additional constraints for a VMA.
>>> 
>>> Intel Software Guard eXtensions (SGX) will use this callback to add two
>>> constraints:
>>> 
>>> 1. Verify that the address range does not have holes: each page address
>>>   must be filled with an enclave page.
>>> 2. Verify that VMA permissions won't surpass the permissions of any enclave
>>>   page within the address range. Enclave cryptographically sealed
>>>   permissions for each page address that set the upper limit for possible
>>>   VMA permissions. Not respecting this can cause #GP's to be emitted.
> 
> Side note, #GP is wrong.  EPCM violations are #PFs.  Skylake CPUs #GP, but
> that's technically an errata.  But this isn't the real motivation, e.g.
> userspace can already trigger #GP/#PF by reading/writing a bad address, SGX
> simply adds another flavor.
> 
>> It's been awhile since I looked at this.  Can you remind us: is this
>> just preventing userspace from shooting itself in the foot or is this
>> something more important?
> 
> Something more important, it's used to prevent userspace from circumventing
> a noexec filesystem by loading code into an enclave, and to give the kernel the
> option of adding enclave specific LSM policies in the future.
> 
> The source file (if one exists) for the enclave is long gone when the enclave
> is actually mmap()'d and mprotect()'d.  To enforce noexec, the requested
> permissions for a given page are snapshotted when the page is added to the
> enclave, i.e. when the enclave is built.  Enclave pages that will be executable
> must originate from an a MAYEXEC VMA, e.g. the source page can't come from a
> noexec file system.
> 
> The ->mprotect() hook allows SGX to reject mprotect() if userspace is declaring
> permissions beyond what are allowed, e.g. trying to map an enclave page with
> EXEC permissions when the page was added to the enclave without EXEC.
> 
> Future LSM policies have a similar need due to vm_file always pointing at
> /dev/sgx/enclave, e.g. policies couldn't be attached to a specific enclave.
> ->mprotect() again allows enforcing permissions at map time that were checked
> at enclave build time, e.g. via an LSM hook.
> 
> Deferring ->mprotect() until LSM support is added (if it ever is) would be
> problematic due to SGX2.  With SGX2, userspace can extend permissions of an
> enclave page (for the CPU's EPC Map entry, not the kernel's page tables)
> without bouncing through the kernel.  Without ->mprotect () enforcement.
> userspace could do EADD(RW) -> mprotect(RWX) -> EMODPE(X) to gain W+X.  We
> want to disallow such a flow now, i.e. force userspace to do EADD(RW,X), so
> that the hypothetical LSM hook would have all information at EADD(), i.e.
> would be aware of the EXEC permission, without creating divergent behavior
> based on whether or not an LSM is active.

That’s what I thought. Can we get this in the changelog?