On Thu, 2023-07-27 at 09:16 -0500, Haitao Huang wrote: > On Wed, 26 Jul 2023 21:50:02 -0500, Huang, Kai <kai.huang@xxxxxxxxx> wrote: > > > On Wed, 2023-07-26 at 18:02 -0700, Haitao Huang wrote: > > > Under heavy load, the SGX EPC reclaimer (ksgxd) may reclaim the SECS EPC > > > > If I read correctly, Dave suggested to not use "high" (heavy in this > > sentence) > > or "low" pressure: > > > > https://lore.kernel.org/lkml/op.179a4xs0wjvjmi@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/T/#m9120eac6a4a94daa7c9fcc47709f241cd181e5dc > > > > And I agree. For instance, consider this happens to one extremely > > "small" > > enclave, while there's a new "big" enclave starts to run. I don't think > > we > > should say this is "under heavy load". Just stick to the fact that the > > reclaimer may reclaim the SECS page. > > > Mybe I have some confusion here but I did not think Dave had issues with > 'heavy load'. When this happens, the last page causing #PF (page A below) > should be the the "youngest" in PTE and it got paged out together with the > SECS before the #PF is even handled. Based on that the ksgxd moves 'young' > pages to the back of the queue for reclaiming, for that to happen, almost > all EPC pages must be paged out for all enclaves at that time, so it means > heavy load to me. And that's also consistent with my tests. I already provided an example: swapping out an "extreme small" enclave. Anyway, no big deal to me. > > > > page for an enclave and set encl->secs.epc_page to NULL. But the SECS > > > EPC page is used for EAUG in the SGX page fault handler without checking > > > for NULL and reloading. > > > > > > Fix this by checking if SECS is loaded before EAUG and loading it if it > > > was reclaimed. > > > > > > The SECS page holds global enclave metadata. It can only be reclaimed > > > when there are no other enclave pages remaining. At that point, > > > virtually nothing can be done with the enclave until the SECS page is > > > paged back in. > ... > > > But it is still possible for a #PF for a non-SECS page to race > > > with paging out the SECS page: when the last resident non-SECS page A > > > triggers a #PF in a non-resident page B, and then page A and the SECS > > > both are paged out before the #PF on B is handled. > > > > > > Hitting this bug requires that race triggered with a #PF for EAUG. > > > > The above race can happen for the normal ELDU path too, thus I suppose > > it will > > be better to mention why the normal ELDU path doesn't have this issue: it > > already does what this fix does. > > > Should we focus on the bug and fix itself instead of explaining a non-bug > case? > And the simple changes in this patch clearly show that too if people look > for that. So you spent a lot of text explaining the race condition, but such race condition applies to both ELDU and EAUG. I personally went to see the code whether ELDU has such issue too, and it turned out only EAUG has issue. If you mention this in the changelog perhaps I wouldn't need to go to read the code. Anyway, just my 2cents. And I don't want to let those block this patch, so feel free to add my tag: Reviewed-by: Kai Huang <kai.huang@xxxxxxxxx>