On Thu, Jun 13, 2019 at 09:16:46AM -0700, Andy Lutomirski wrote: > On Thu, Jun 13, 2019 at 6:46 AM Sean Christopherson > <sean.j.christopherson@xxxxxxxxx> wrote: > > > > On Thu, Jun 13, 2019 at 12:38:02AM +0000, Jethro Beekman wrote: > > > On 2019-06-10 11:53, Sean Christopherson wrote: > > > >On Fri, Jun 07, 2019 at 12:32:23PM -0700, Andy Lutomirski wrote: > > > >> > > > >>>On Jun 6, 2019, at 10:32 AM, Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote: > > > >>> > > > >>>>On Thu, Jun 06, 2019 at 10:20:38AM -0700, Andy Lutomirski wrote: > > > >>>>On Wed, Jun 5, 2019 at 12:49 PM Sean Christopherson > > > >>>><sean.j.christopherson@xxxxxxxxx> wrote: > > > >>>>> > > > >>>>>For some enclaves, e.g. an enclave with a small code footprint and a > > > >>>>>large working set, the vast majority of pages added to the enclave are > > > >>>>>zero pages. Introduce a flag to denote such zero pages. The major > > > >>>>>benefit of the flag will be realized in a future patch to use Linux's > > > >>>>>actual zero page as the source, as opposed to explicitly zeroing the > > > >>>>>enclave's backing memory. > > > >>>>> > > > >>>> > > > >>>>I feel like I probably asked this at some point, but why is there a > > > >>>>workqueue here at all? > > > >>> > > > >>>Performance. A while back I wrote a patch set to remove the worker queue > > > >>>and discovered that it tanks enclave build time when the enclave is being > > > >>>hosted by a Golang application. Here's a snippet from a mail discussing > > > >>>the code. > > > >>> > > > >>> The bad news is that I don't think we can remove the add page worker > > > >>> as applications with userspace schedulers, e.g. Go's M:N scheduler, > > > >>> can see a 10x or more throughput improvement when using the worker > > > >>> queue. I did a bit of digging for the Golang case to make sure I > > > >>> wasn't doing something horribly stupid/naive and found that it's a > > > >>> generic issue in Golang with blocking (or just long-running) system > > > >>> calls. Because Golang multiplexes Goroutines on top of OS threads, > > > >>> blocking syscalls introduce latency and context switching overhead, > > > >>> e.g. Go's scheduler will spin up a new OS thread to service other > > > >>> Goroutines after it realizes the syscall has blocked, and will later > > > >>> destroy one of the OS threads so that it doesn't build up too many > > > >>> unused. > > > >>> > > > >>>IIRC, the scenario is spinning up several goroutines, each building an > > > >>>enclave. I played around with adding a flag to do a synchronous EADD > > > >>>but didn't see a meaningful change in performance for the simple case. > > > >>>Supporting both the worker queue and direct paths was complex enough > > > >>>that I decided it wasn't worth the trouble for initial upstreaming. > > > >> > > > >>Sigh. > > > >> > > > >>It seems silly to add a workaround for a language that has trouble calling > > > >>somewhat-but-not-too-slow syscalls or ioctls. > > > >> > > > >>How about fixing this in Go directly? Either convince the golang people to > > > >>add a way to allocate a real thread for a particular region of code or have > > > >>the Go SGX folks write a bit of C code to do a whole bunch of ioctls and > > > >>have Go call *that*. Then the mess stays in Go where it belongs. > > > > > > > >Actually, I'm pretty sure changing the ioctl() from ADD_PAGE to ADD_REGION > > > >would eliminate the worst of the golang slowdown without requiring > > > >userspace to get super fancy. I'm in favor of eliminating the work queue, > > > >especially if the UAPI is changed to allow adding multiple pages in a > > > >single syscall. > > > > > > > > > > I don't know if this is going to matter a whole lot, but have you considered > > > the performance impact of needing to the EPC paging while doing the EADD > > > ioctl and how this interacts with having a workqueue? > > > > Yep, other than the goroutine case, eliminating the workqueue doesn't > > substantially affect performance in either direction, regardless of the > > pressure on the EPC. > > It should get rid of some extra copies and allocations, no? My experiments were fairly dumb, e.g. removed the workqueue without doing extra enhancements like avoiding copy_from_user. IIRC, the type of allocation changed, but the number of copies/allocations stayed the same.