On Mon, Dec 17, 2018 at 08:59:54PM -0800, Andy Lutomirski wrote: > On Mon, Dec 17, 2018 at 2:20 PM Sean Christopherson > <sean.j.christopherson@xxxxxxxxx> wrote: > > > > > My brain is still sorting out the details, but I generally like the idea > > of allocating an anon inode when creating an enclave, and exposing the > > other ioctls() via the returned fd. This is essentially the approach > > used by KVM to manage multiple "layers" of ioctls across KVM itself, VMs > > and vCPUS. There are even similarities to accessing physical memory via > > multiple disparate domains, e.g. host kernel, host userspace and guest. > > > > In my mind, opening /dev/sgx would give you the requisite inode. I'm > not 100% sure that the chardev infrastructure allows this, but I think > it does. My fd/inode knowledge is lacking, to say the least. Whatever works, so long as we have a way to uniquely identify enclaves. > > The only potential hiccup I can see is the build flow. Currently, > > EADD+EEXTEND is done via a work queue to avoid major performance issues > > (10x regression) when userspace is building multiple enclaves in parallel > > using goroutines to wrap Cgo (the issue might apply to any M:N scheduler, > > but I've only confirmed the Golang case). The issue is that allocating > > an EPC page acts like a blocking syscall when the EPC is under pressure, > > i.e. an EPC page isn't immediately available. This causes Go's scheduler > > to thrash and tank performance[1]. > > What's the issue, and how does a workqueue help? I'm wondering if a > nicer solution would be an ioctl to add lots of pages in a single > call. Adding pages via workqueue makes the ioctl itself fast enough to avoid triggering Go's rescheduling. A batched EADD flow would likely help, I just haven't had the time to rework the userspace side to be able to test the performance. > > > > Alternatively, we could change the EADD+EEXTEND flow to not insert the > > added page's PFN into the owner's process space, i.e. force userspace to > > fault when it runs the enclave. But that only delays the issue because > > eventually we'll want to account EPC pages, i.e. add a cgroup, at which > > point we'll likely need current->mm anyways. > > You should be able to account the backing pages to a cgroup without > actually sticking them into the EPC, no? Or am I misunderstanding? I > guess we'll eventually want a cgroup to limit use of the limited EPC > resources. It's the latter, a cgroup to limit EPC. The mm is used to retrieve the cgroup without having track e.g. the task_struct.