On Tue, Dec 15, 2020 at 11:34:37AM -0600, Haitao Huang wrote: > On Mon, 14 Dec 2020 23:59:55 -0600, Jarkko Sakkinen <jarkko@xxxxxxxxxx> > wrote: > > > On Tue, Dec 15, 2020 at 07:56:01AM +0200, Jarkko Sakkinen wrote: > > > On Mon, Dec 14, 2020 at 11:01:32AM -0800, Sean Christopherson wrote: > > > > On Fri, Dec 11, 2020, Jarkko Sakkinen wrote: > > > > > Each sgx_mmun_notifier_release() starts a grace period, which > > > means that > > > > > > > > Should be sgx_mmu_notifier_release(), here and in the comment. > > > > > > Thanks. > > > > > > > > one extra synchronize_rcu() in sgx_encl_release(). Add it there. > > > > > > > > > > sgx_release() has the loop that drains the list but with bad > > > luck the > > > > > entry is already gone from the list before that loop processes it. > > > > > > > > Why not include the actual analysis that "proves" the bug? The > > > splat that > > > > Haitao reported would also be useful info. > > > > > > True. I can include a snippet of dmesg to the commit message. > > > > > > > > Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer") > > > > > Cc: Borislav Petkov <bp@xxxxxxxxx> > > > > > Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> > > > > > Reported-by: Sean Christopherson <seanjc@xxxxxxxxxx> > > > > > > > > Haitao reported the bug, and for all intents and purposes provided > > > the fix. I > > > > just did the analysis to verify that there was a legitimate bug > > > and that the > > > > synchronization in sgx_encl_release() was indeed necessary. > > > > > > Good and valid point. The way I see it, the tags should be: > > > > > > Reported-by: Haitao Huang <haitao.huang@xxxxxxxxxxxxxxx> > > > Suggested-by: Sean Christopherson <seanjc@xxxxxxxxxx> > > > > > > Haitao pointed out the bug but from your analysis I could resolve that > > > this is the fix to implement, and was able to write the long > > > description for the commit. > > > > > > Does this make sense to you? > > > > I'm sending v2 next week (this week on vacation). > > > > /Jarkko > > I don't mind either how tags are assigned. But our testing reveals > significant latency introduced in scenarios of heavy loading/unloading > enclaves. synchronize_srcu_expedited fixed the issue. Please analyze and > confirm if that's more appropriate than synchronize_srcu here. I don't see any obvious reason why *_expedited could not be used here, as most of the time sync's are taken care of sgx_release() loop, and the final sync is with sgx_mmu_notifier_release(). More aggressive spinning should not do any harm here. About the tags. I just try to get them right, and it is sometimes not straight-forward. So I guess, with all things considered, I'll put suggested-by from you. Once I get a refined patch out, try it out with your workloads and provide me tested-by, if it is working for you. /Jarkko