On Thu, May 07, 2020 at 12:49:15PM -0400, Nathaniel McCallum wrote: > On Thu, May 7, 2020 at 1:03 AM Haitao Huang > <haitao.huang@xxxxxxxxxxxxxxx> wrote: > > > > On Wed, 06 May 2020 17:14:22 -0500, Sean Christopherson > > <sean.j.christopherson@xxxxxxxxx> wrote: > > > > > On Wed, May 06, 2020 at 05:42:42PM -0400, Nathaniel McCallum wrote: > > >> Tested on Enarx. This requires a patch[0] for v29 support. > > >> > > >> Tested-by: Nathaniel McCallum <npmccallum@xxxxxxxxxx> > > >> > > >> However, we did uncover a small usability issue. See below. > > >> > > >> [0]: > > >> https://github.com/enarx/enarx/pull/507/commits/80da2352aba46aa7bc6b4d1fccf20fe1bda58662 > > > > > > ... > > > > > >> > * Disallow mmap(PROT_NONE) from /dev/sgx. Any mapping (e.g. > > >> anonymous) can > > >> > be used to reserve the address range. Now /dev/sgx supports only > > >> opaque > > >> > mappings to the (initialized) enclave data. > > >> > > >> The statement "Any mapping..." isn't actually true. Yeah, this definitely misleading. I haven't looked at our most recent docs, but I'm going to go out on a limb and assume we haven't documented the preferred mechanism for carving out virtual memory for the enclave. That absolutely should be done. > > >> Enarx creates a large enclave (currently 64GiB). This worked when we > > >> created a file-backed mapping on /dev/sgx/enclave. However, switching > > >> to an anonymous mapping fails with ENOMEM. We suspect this is because > > >> the kernel attempts to allocate all the pages and zero them but there > > >> is insufficient RAM available. We currently work around this by > > >> creating a shared mapping on /dev/zero. > > > > > > Hmm, the kernel shouldn't actually allocate physical pages unless they're > > > written. I'll see if I can reproduce. > > > > > > > For larger size mmap, I think it requires enabling vm overcommit mode 1: > > echo 1 | sudo tee /proc/sys/vm/overcommit_memory It shouldn't unless the initial mmap is "broken". Not truly broken, but broken in the sense that what Enarx is asking for is not actually what it desires. > Which means the default experience isn't good. What PROT_* and MAP_* flags are passed to mmap()? Overcommit only applies to VM_WRITE (a.k.a. PROT_WRITE) && !VM_SHARED && !VM_NORESERVED and, ignoring rlimits, VM expansion only applies to VM_WRITE && !VM_SHARED && !VM_STACK So hopefully Enarx is doing something like base = mmap(NULL, 64gb, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); because that means this is effectively a userspace bug. This goes back to my comment about the mmap() being "broken". Userspace is asking for a writable, private mapping, in which case it absolutely should be accounted. If using base = mmap(NULL, 64gb, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); works, then updating the SGX docs to better explain how to establish ELRANGE is sufficient (we need to that in any case). If the above still fails then something else is in play.