Re: [PATCH v40 10/24] mm: Add 'mprotect' hook to struct vm_operations_struct

Andy Lutomirski <luto@xxxxxxxxxx> · Thu, 12 Nov 2020 14:41:00 -0800

On Thu, Nov 12, 2020 at 1:31 PM Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
>
> On 11/12/20 12:58 PM, Dr. Greg wrote:
> > @@ -270,11 +270,10 @@ static int sgx_vma_mprotect(struct vm_area_struct *vma,
> >                           struct vm_area_struct **pprev, unsigned long start,
> >                           unsigned long end, unsigned long newflags)
> >  {
> > -     int ret;
> > +     struct sgx_encl *encl = vma->vm_private_data;
> >
> > -     ret = sgx_encl_may_map(vma->vm_private_data, start, end, newflags);
> > -     if (ret)
> > -             return ret;
> > +     if ( test_bit(SGX_ENCL_INITIALIZED, &encl->flags) )
> > +             return -EACCES;
> >
> >       return mprotect_fixup(vma, pprev, start, end, newflags);
> >  }
>
> This rules out mprotect() on running enclaves.  Does that break any
> expectations from enclave authors, or take away capabilities that folks
> need?

It certainly prevents any scheme in which an enclave coordinates with
the outside world to do W-and-then-X JIT inside.  I'm also not
convinced it has any real effect at all unless there's some magic I
missed to prevent someone from using mmap(2) to effectively change
permissions.

Everyone, IMO this SGX1 - vs - SGX2 - vs - EDMM discussion is entirely
missing the point and is a waste of everyone's time.  Let's pretend
we're building a system that has nothing to do with SGX and requires
no special hardware support at all.  It works like this:

A user program opens /dev/xyz and gets back an fd that represents 16
MB of memory.  The user program copies some data from disk (or network
or whatever) into fd (using write(2) or ioctl(2) or mmap(2) and
memcpy) and then mmaps some of the fd as R and some as RW and some as
RX, and then the user program jumps into the RX mapping.

If we replace /dev/xyz with /dev/zero, then this simply does not work
under a reasonably strict W^X policy -- a lot of people think it's
quite reasonable for an OS to prevent a user program from obtaining an
X mapping containing anything other than a mapping from a file on
disk.  To solve this, we can do one of at least three things:

a) You can't use /dev/xyz unless you have permission to create WX
memory or to at least create W memory and then change it to X.

b) You can do whatever you want with /dev/xyz, and LSM policies are
blatantly violated as a result.

c) The /dev/xyz API is clever and tracks, page-by-page, whether the
user intends to ever write and/or execute that page, and behaves
accordingly.

This patchset takes the approach (c).  The actual clever policy isn't
here yet, and we don't really know whether it will ever appear, but
the API is set up to enable such a policy to be written.  This appears
to be a win for everyone, since the code is pretty clean and the API
is straightforward.

Now, back to SGX.  There are only two things that are remotely
SGX-specific here.  First, SGX requires this unusual memory model in
which there is an executable mapping of (part of) a device node. [0]
Second, early SGX hardware had this oddity that the kernel could set a
per-backing-page (as opposed to per-PTE) bit to permanently disable X
on a given /dev/sgx page.  Building a security model around that would
have been a hack, and it DOES NOT WORK on new hardware.  So can we
please stop discussing it?  None of the actual interesting parts of
this have much to do with SGX per se and have nothing whatsoever to do
with EDMM or any other Intel buzzword.

Heck, if anyone actually cared to do so, something with essentially
the same semantics could probably be built using SEV hardware instead
of SGX, and it would have exactly the same issue if we wanted it to
work for tasks that didn't have access to /dev/kvm.

[0] SGX doesn't *really* require this.  We could set things up so that
you do mmap(..., MAP_ANONYMOUS, fd, ...) and then somehow introduce
that mapping to SGX.  I think the result would be too disgusting to
seriously consider.