Hi Dave, On 4/29/2022 12:45 PM, Dave Hansen wrote: > On 4/29/22 11:50, Reinette Chatre wrote: >> I am not familiar with this area - is the PFN expected to be consistent? Do >> you perhaps have an idea why the PFN of the PCMD page may have changed? This >> is running with this series applied so the ELDU flow would do a lookup of the >> PCMD page and not attempt to allocate a new one. > > First of all, cool! This is really good progress! > > It's possible for the PCMD shmem page to be swapped out and swapped back > in. The pfn would be likely to change in that case. But, that seems > highly unlikely in that short of a period of time. > > I'd dump out two more things: > > First, dump out page_pcmd_off, just like you're doing with page_index in > case page_pcmd_off itself is getting botched. I looked but couldn't > find anything obvious. > > Second, dump out the pfn in sgx_encl_truncate_backing_page(). It's > possible that something is getting overly-aggressive and zapping the > PCMD page too early. That would be easy to explain with that PCMD > locking issue you discovered. But, your patch should have fixed that issue. > > For debugging, could you double-check that the PCMD page *is* empty > around sgx_encl_truncate_backing_page()? If there's a race here you can > also enlarge the race window by adding an msleep() or a spin loop > somewhere after the memchr_inv(). > > You could also hold an extra reference on the PCMD page, maybe something > like the attached patch. That will let you inspect the actual page > after it is *actually* truncated. There should never be data in the > page there. Thank you so much for pointing me in the right direction. With this information I believe that I found a race condition. I did create a fix for it and it has been running for more than an hour now without a WARN _and_ without the -ENOENT error introduced in patch 4/4. Please find below the description of the race and the fix. Scenario is that the reclaimer is evicting a range of pages that are being faulted right away. Consider three enclave pages (ENCLAVE_A, ENCLAVE_B, and ENCLAVE_C) sharing a PCMD page (PCMD_ABC). ENCLAVE_A and ENCLAVE_B are in the enclave memory and ENCLAVE_C is in backing store. PCMD_ABC contains just one entry, that of ENCLAVE_C. Scenario proceeds where ENCLAVE_A is being evicted from the enclave while ENCLAVE_C is faulted in. Scenario is with this patch series applied. sgx_reclaim_pages() { ... /* * Reclaim ENCLAVE_A */ mutex_lock(&encl->lock); /* * Get a reference to ENCLAVE_A's * shmem page where enclave page * encrypted data will be stored * as well as a reference to the * enclave page's PCMD data page, * PCMD_ABC. */ sgx_encl_alloc_backing(...); mutex_unlock(&encl->lock); /* * Fault ENCLAVE_C */ sgx_vma_fault() { mutex_lock(&encl->lock); /* * Get reference to * ENCLAVE_C's shmem page * as well as PCMD_ABC. */ sgx_encl_get_backing(...) /* * load page back into * enclave via ELDU */ /* * Release reference to * ENCLAVE_C' shmem page and * PCMD_ABC. */ sgx_encl_put_backing(...); /* * PCMD_ABC is found empty so * it and ENCLAVE_C's shmem page * are truncated. */ /* Truncate ENCLAVE_C backing page */ sgx_encl_truncate_backing_page(); /* Truncate PCMD_ABC */ sgx_encl_truncate_backing_page(); mutex_unlock(&encl->lock); ... } mutex_lock(&encl->lock); /* * Write encrypted contents of * ENCLAVE_A to ENCLAVE_A shmem * page and its PCMD data to * PCMD_ABC. */ sgx_encl_put_backing(...) /* * Reference to PCMD_ABC is * dropped and it is truncated. * ENCLAVE_A's PCMD data is lost. */ mutex_unlock(&encl->lock); } What happens next depends on whether it is ENCLAVE_A being faulted in or ENCLAVE_B (or ENCLAVE_C) being evicted. If ENCLAVE_A is faulted then the error introduced in patch 4/4 would be encountered since lookup of PCMD_ABC fails. Before patch 4/4 a new PCMD page would be allocated and then ENCLS[ELDU] would WARN on the PCMD data. If ENCLAVE_B is evicted first then a new PCMD_ABC would be allocated but later when ENCLAVE_A is faulted the ENCLS[ELDU] instruction would #GP during its checks of the PCMD value and the WARN would be encountered. Below is the fix I am currently testing. It would not truncate the memory if there is a reference to the PCMD page. diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index 22ed886dc825..81c7bfc12b9b 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -98,11 +98,10 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page, kunmap_atomic(pcmd_page); kunmap_atomic((void *)(unsigned long)pginfo.contents); - sgx_encl_put_backing(&b, true); - + put_page(b.contents); sgx_encl_truncate_backing_page(encl, page_index); - if (pcmd_page_empty) { + if (pcmd_page_empty && && put_page_testzero(b.pcmd)) { ida_free(&encl->pcmd_in_backing, PFN_DOWN(page_pcmd_off)); sgx_encl_truncate_backing_page(encl, PFN_DOWN(page_pcmd_off)); } I plan on letting the test keep running to make sure it really fixes the issue. If all goes well patch 4/4 can be dropped and the fix included instead. Thanks again for your guidance. It was tremendously helpful. Reinette