On Wed, 2023-07-26 at 18:02 -0700, Haitao Huang wrote: > Under heavy load, the SGX EPC reclaimer (ksgxd) may reclaim the SECS EPC If I read correctly, Dave suggested to not use "high" (heavy in this sentence) or "low" pressure: https://lore.kernel.org/lkml/op.179a4xs0wjvjmi@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/T/#m9120eac6a4a94daa7c9fcc47709f241cd181e5dc And I agree. For instance, consider this happens to one extremely "small" enclave, while there's a new "big" enclave starts to run. I don't think we should say this is "under heavy load". Just stick to the fact that the reclaimer may reclaim the SECS page. > page for an enclave and set encl->secs.epc_page to NULL. But the SECS > EPC page is used for EAUG in the SGX page fault handler without checking > for NULL and reloading. > > Fix this by checking if SECS is loaded before EAUG and loading it if it > was reclaimed. > > The SECS page holds global enclave metadata. It can only be reclaimed > when there are no other enclave pages remaining. At that point, > virtually nothing can be done with the enclave until the SECS page is > paged back in. > > An enclave can not run nor generate page faults without a resident SECS > page. > I am not sure whether "nor generate page faults without a resident SECS page" is accurate? When SECS is swapped out, I suppose the first EENTER should trigger a #PF on the TSC page and in the #PF handler the SECS will be swapped in first. I guess you can just remove this sentence? > But it is still possible for a #PF for a non-SECS page to race > with paging out the SECS page: when the last resident non-SECS page A > triggers a #PF in a non-resident page B, and then page A and the SECS > both are paged out before the #PF on B is handled. > > Hitting this bug requires that race triggered with a #PF for EAUG. The above race can happen for the normal ELDU path too, thus I suppose it will be better to mention why the normal ELDU path doesn't have this issue: it already does what this fix does. > Following is a trace when it happens. > > BUG: kernel NULL pointer dereference, address: 0000000000000000 > RIP: 0010:sgx_encl_eaug_page+0xc7/0x210 > Call Trace: > ? __kmem_cache_alloc_node+0x16a/0x440 > ? xa_load+0x6e/0xa0 > sgx_vma_fault+0x119/0x230 > __do_fault+0x36/0x140 > do_fault+0x12f/0x400 > __handle_mm_fault+0x728/0x1110 > handle_mm_fault+0x105/0x310 > do_user_addr_fault+0x1ee/0x750 > ? __this_cpu_preempt_check+0x13/0x20 > exc_page_fault+0x76/0x180 > asm_exc_page_fault+0x27/0x30 > > Fixes: 5a90d2c3f5ef ("x86/sgx: Support adding of pages to an initialized enclave") > Cc: stable@xxxxxxxxxxxxxxx # v6.0+ > Signed-off-by: Haitao Huang <haitao.huang@xxxxxxxxxxxxxxx> > Reviewed-by: Jarkko Sakkinen <jarkko@xxxxxxxxxx> > Acked-by: Reinette Chatre <reinette.chatre@xxxxxxxxx> With above addressed, feel free to add: Reviewed-by: Kai Huang <kai.huang@xxxxxxxxx>