On 5/23/19 11:38 AM, Andy Lutomirski wrote:
On Thu, May 23, 2019 at 7:17 AM Sean Christopherson
<sean.j.christopherson@xxxxxxxxx> wrote:
On Thu, May 23, 2019 at 01:26:28PM +0300, Jarkko Sakkinen wrote:
On Wed, May 22, 2019 at 07:35:17PM -0700, Sean Christopherson wrote:
But actually, there's no need to disallow mmap() after ECREATE since the
LSM checks also apply to mmap(), e.g. FILE__EXECUTE would be needed to
mmap() any enclave pages PROT_EXEC. I guess my past self thought mmap()
bypassed LSM checks? The real problem is that mmap()'ng an existing
enclave would require FILE__WRITE and FILE__EXECUTE, which puts us back
at square one.
I'm lost with the constraints we want to set.
As is today, SELinux policies would require enclave loaders to have
FILE__WRITE and FILE__EXECUTE permissions on /dev/sgx/enclave. Presumably
other LSMs have similar requirements. Requiring all processes to have
FILE__{WRITE,EXECUTE} permissions means the permissions don't add much
value, e.g. they can't be used to distinguish between an enclave that is
being loaded from an unmodified file and an enclave that is being
generated on the fly, e.g. Graphene.
Looking back at Andy's mail, he was talking about requiring FILE__EXECUTE
to run an enclave, so perhaps it's only FILE__WRITE that we're trying to
special case.
I thought about this some more, and I have a new proposal that helps
address the ELRANGE alignment issue and the permission issue at the
cost of some extra verbosity. Maybe you all can poke holes in it :)
The basic idea is to make everything more explicit from a user's
perspective. Here's how it works:
Opening /dev/sgx/enclave gives an enclave_fd that, by design, doesn't
give EXECUTE or WRITE. mmap() on the enclave_fd only works if you
pass PROT_NONE and gives the correct alignment. The resulting VMA
cannot be mprotected or mremapped. It can't be mmapped at all until
after ECREATE because the alignment isn't known before that.
Associated with the enclave are a bunch (up to 7) "enclave segment
inodes". These are anon_inodes that are created automagically. An
enclave segment is a group of pages, not necessary contiguous, with an
upper bound on the memory permissions. Each enclave page belongs to a
segment. When you do EADD, you tell the driver what segment you're
adding to. [0] This means that EADD gets an extra argument that is a
permission mask for the page -- in addition to the initial SECINFO,
you also pass to EADD something to the effect of "I promise never to
map this with permissions greater than RX".
Then we just need some way to mmap a region from an enclave segment.
This could be done by having a way to get an fd for an enclave segment
or it could be done by having a new ioctl SGX_IOC_MAP_SEGMENT. User
code would use this operation to replace, MAP_FIXED-style, ranges from
the big PROT_NONE mapping with the relevant pages from the enclave
segment. The resulting vma would only have VM_MAYWRITE if the segment
is W, only have VM_MAYEXEC if the segment is X, and only have
VM_MAYREAD if the segment is R. Depending on implementation details,
the VMAs might need to restrict mremap() to avoid mapping pages that
aren't part of the segment in question.
It's plausible that this whole thing works without the magic segment
inodes under the hood, but figuring that out would need a careful look
at how all the core mm bits and LSM bits work together.
To get all the LSM stuff to work, SELinux will need some way to
automatically assign an appropriate label to the segment inodes. I
assume that such a mechanism already exists and gets used for things
like sockets, but I haven't actually confirmed this.
I don't follow that. socket inodes are not anon inodes, and anon inodes
have no per-instance data by definition, and typically you're only
dealing with a single anon inode for all files, and hence they were long
ago marked S_PRIVATE and exempted from all LSM checking except for
EXECMEM on mmap/mprotect PROT_EXEC. We have no way to perform useful
security checking on them currently. socket inodes we can label from
their creating process but even that's not going to support multiple
labels for different sockets created by the same process unless the
process explicitly used setsockcreatecon(3) aka /proc/self/attr/sockcreate
[0] There needs to be some vaguely intelligent semantics if you EADD
the *same* address more than once. A simple solution would be to
disallow it if the segments don't match.