On Thu, Oct 10, 2019 at 11:56:07AM -0700, Sean Christopherson wrote: > On Thu, Oct 10, 2019 at 11:35:48AM -0700, Sean Christopherson wrote: > > On Wed, Oct 09, 2019 at 03:04:50AM +0300, Jarkko Sakkinen wrote: > > > On Mon, Oct 07, 2019 at 09:13:34PM -0700, Sean Christopherson wrote: > > > > WARN if EREMOVE fails when destroying an enclave. sgx_encl_release() > > > > uses the non-WARN __sgx_free_page() when freeing pages as some pages may > > > > be in the process of being reclaimed, i.e. are owned by the reclaimer. > > > > But EREMOVE should never fail as sgx_encl_destroy() is only called when > > > > the enclave cannot have active threads, e.g. prior to EINIT and when the > > > > enclave is being released. > > > > > > > > Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> > > > > > > For me this concludes that I will manually convert all the call sites > > > to use __sgx_free_page() and add appropriate warnings. I agree with > > > Borislav's conclusions here. > > > > Argh, now we have a bunch of call sites that can silently leak EPC pages, > > and I'm seeing timeouts during testing that strongly suggest pages are > > being leaked... > > Confirmed that we're leaking pages, but it's not related to the -EBUSY > case in sgx_free_page(). Debug in progress... > > As to the sgx_free_page() thing, I think we can invert the old WARN logic > and make everyone happy. I'll send a patch. Figured out what's up. I'm testing in a VM with multiple EPC sections. Because of a change in v23[*], sgx_nr_free_pages is getting corrupted due to non-atomic concurrent writes. When it drops below 0 and wraps to a high value the swap thread stops reclaiming and things grind to a halt. [*] https://patchwork.kernel.org/patch/11146733/#22887361