> On Jun 6, 2019, at 10:32 AM, Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote: > >> On Thu, Jun 06, 2019 at 10:20:38AM -0700, Andy Lutomirski wrote: >> On Wed, Jun 5, 2019 at 12:49 PM Sean Christopherson >> <sean.j.christopherson@xxxxxxxxx> wrote: >>> >>> For some enclaves, e.g. an enclave with a small code footprint and a >>> large working set, the vast majority of pages added to the enclave are >>> zero pages. Introduce a flag to denote such zero pages. The major >>> benefit of the flag will be realized in a future patch to use Linux's >>> actual zero page as the source, as opposed to explicitly zeroing the >>> enclave's backing memory. >>> >> >> I feel like I probably asked this at some point, but why is there a >> workqueue here at all? > > Performance. A while back I wrote a patch set to remove the worker queue > and discovered that it tanks enclave build time when the enclave is being > hosted by a Golang application. Here's a snippet from a mail discussing > the code. > > The bad news is that I don't think we can remove the add page worker > as applications with userspace schedulers, e.g. Go's M:N scheduler, > can see a 10x or more throughput improvement when using the worker > queue. I did a bit of digging for the Golang case to make sure I > wasn't doing something horribly stupid/naive and found that it's a > generic issue in Golang with blocking (or just long-running) system > calls. Because Golang multiplexes Goroutines on top of OS threads, > blocking syscalls introduce latency and context switching overhead, > e.g. Go's scheduler will spin up a new OS thread to service other > Goroutines after it realizes the syscall has blocked, and will later > destroy one of the OS threads so that it doesn't build up too many > unused. > > IIRC, the scenario is spinning up several goroutines, each building an > enclave. I played around with adding a flag to do a synchronous EADD > but didn't see a meaningful change in performance for the simple case. > Supporting both the worker queue and direct paths was complex enough > that I decided it wasn't worth the trouble for initial upstreaming. Sigh. It seems silly to add a workaround for a language that has trouble calling somewhat-but-not-too-slow syscalls or ioctls. How about fixing this in Go directly? Either convince the golang people to add a way to allocate a real thread for a particular region of code or have the Go SGX folks write a bit of C code to do a whole bunch of ioctls and have Go call *that*. Then the mess stays in Go where it belongs.