Re: [PATCH v38 10/24] mm: Add vm_ops->mprotect()

Sean Christopherson <sean.j.christopherson@xxxxxxxxx> · Mon, 21 Sep 2020 09:57:58 -0700

On Mon, Sep 21, 2020 at 03:49:46PM +0300, Jarkko Sakkinen wrote:
> On Fri, Sep 18, 2020 at 04:53:37PM -0700, Sean Christopherson wrote:
> > On Fri, Sep 18, 2020 at 08:09:04AM -0700, Andy Lutomirski wrote:
> > > On Tue, Sep 15, 2020 at 4:28 AM Jarkko Sakkinen
> > > <jarkko.sakkinen@xxxxxxxxxxxxxxx> wrote:
> > > >
> > > > From: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> > > >
> > > > Add vm_ops()->mprotect() for additional constraints for a VMA.
> > > >
> > > > Intel Software Guard eXtensions (SGX) will use this callback to add two
> > > > constraints:
> > > >
> > > > 1. Verify that the address range does not have holes: each page address
> > > >    must be filled with an enclave page.
> > > > 2. Verify that VMA permissions won't surpass the permissions of any enclave
> > > >    page within the address range. Enclave cryptographically sealed
> > > >    permissions for each page address that set the upper limit for possible
> > > >    VMA permissions. Not respecting this can cause #GP's to be emitted.
> > 
> > Side note, #GP is wrong.  EPCM violations are #PFs.  Skylake CPUs #GP, but
> > that's technically an errata.  But this isn't the real motivation, e.g.
> > userspace can already trigger #GP/#PF by reading/writing a bad address, SGX
> > simply adds another flavor.
> > 
> > > It's been awhile since I looked at this.  Can you remind us: is this
> > > just preventing userspace from shooting itself in the foot or is this
> > > something more important?
> > 
> > Something more important, it's used to prevent userspace from circumventing
> > a noexec filesystem by loading code into an enclave, and to give the kernel the
> > option of adding enclave specific LSM policies in the future.
> > 
> > The source file (if one exists) for the enclave is long gone when the enclave
> > is actually mmap()'d and mprotect()'d.  To enforce noexec, the requested
> > permissions for a given page are snapshotted when the page is added to the
> > enclave, i.e. when the enclave is built.  Enclave pages that will be executable
> > must originate from an a MAYEXEC VMA, e.g. the source page can't come from a
> > noexec file system.
> 
> noexec check is done in __sgx_encl_add_page(), not in this callback.
> sgx_vma_mprotect() calls sgx_encl_may_map(), which iterates the
> addresses, checks that permissions are not surpassed and there are
> no holes.

Yes, that's what I said.

> > The ->mprotect() hook allows SGX to reject mprotect() if userspace is declaring
> > permissions beyond what are allowed, e.g. trying to map an enclave page with
> > EXEC permissions when the page was added to the enclave without EXEC.
> 
> For my ear this is tautological because if user space would use
> differing permissions, it would exactly soot itself in the foot.

Not on SGX2 hardware.  The kernel can't disable ENCLU[MODPE].

> What really should be documented is to answer why we consider an enclave
> mappable, if and only if the conditions are that I described about
> sgx_encl_may_map() functioning.
> 
> This is obviously part of the answer [*], i.e. with pharsing LSM hook
> requires an enclave to have a state, which is compatible with the mmap
> permissions and is fully populated.
> 
> The 2nd part of the answer is the answer to the question: why we want to
> feed LSM hooks enclaves exactly in this state.
> 
> What would you answer to that?

I would copy-paste the part of the response that was snipped...

> [*] https://lore.kernel.org/linux-sgx/20200918232458.GA6175@xxxxxxxxxxxxxxx/T/#u
> 
> /Jarkko