Hi Dave, On 5/4/2022 4:50 PM, Dave Hansen wrote: > On 5/4/22 16:36, Reinette Chatre wrote: >> The page fault handler cannot check for SGX_ENCL_PAGE_BEING_RECLAIMED >> until the reclaimer releases the mutex. > > Ahh, got it now. I thought the mutex wasn't held on the reclaimer side. > Thanks for the refresher. > >>> One other thing. The role of encl->lock here is very important. >>> Without it, two concurrent page faults could do their individual >>> memset(), each see !pcmd_page_empty, then decline to truncate the page. >> >> Your argument is not clear to me. Yes, this would be an issue (and >> that will not be the only issue) if the enclave mutex is not held >> by the page fault handler ... but the enclave mutex _is_ held by the >> page fault handler. > > It would just be nice to clearly state the locking logic in the eventual > changelog. Will do. Below is the changelog I am currently working on: x86/sgx: Fix race between reclaimer and page fault handler Haitao reported encountering a WARN triggered by the ENCLS[ELDU] instruction faulting with a #GP. The WARN is encountered when the reclaimer evicts a range of pages from the enclave when the same pages are faulted back right away. Consider two enclave pages (ENCLAVE_A and ENCLAVE_B) sharing a PCMD page (PCMD_AB). ENCLAVE_A is in the enclave memory and ENCLAVE_B is in the backing store. PCMD_AB contains just one entry, that of ENCLAVE_B. Scenario proceeds where ENCLAVE_A is being evicted from the enclave while ENCLAVE_B is faulted in. sgx_reclaim_pages() { ... /* * Reclaim ENCLAVE_A */ mutex_lock(&encl->lock); /* * Get a reference to ENCLAVE_A's * shmem page where enclave page * encrypted data will be stored * as well as a reference to the * enclave page's PCMD data page, * PCMD_AB. * Release mutex before writing * any data to the shmem pages. */ sgx_encl_get_backing(...); encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED; mutex_unlock(&encl->lock); /* * Fault ENCLAVE_B */ sgx_vma_fault() { mutex_lock(&encl->lock); /* * Get reference to * ENCLAVE_B's shmem page * as well as PCMD_AB. */ sgx_encl_get_backing(...) /* * Load page back into * enclave via ELDU. */ /* * Release reference to * ENCLAVE_B' shmem page and * PCMD_AB. */ sgx_encl_put_backing(...); /* * PCMD_AB is found empty so * it and ENCLAVE_B's shmem page * are truncated. */ /* Truncate ENCLAVE_B backing page */ sgx_encl_truncate_backing_page(); /* Truncate PCMD_AB */ sgx_encl_truncate_backing_page(); mutex_unlock(&encl->lock); ... } mutex_lock(&encl->lock); encl_page->desc &= ~SGX_ENCL_PAGE_BEING_RECLAIMED; /* * Write encrypted contents of * ENCLAVE_A to ENCLAVE_A shmem * page and its PCMD data to * PCMD_AB. */ sgx_encl_put_backing(...) /* * Reference to PCMD_AB is * dropped and it is truncated. * ENCLAVE_A's PCMD data is lost. */ mutex_unlock(&encl->lock); } What happens next depends on whether it is ENCLAVE_A being faulted in or ENCLAVE_B being evicted - but both end up with ENCLS[ELDU] faulting with a #GP. If ENCLAVE_A is faulted then at the time sgx_encl_get_backing() is called a new PCMD page is allocated and providing the empty PCMD data for ENCLAVE_A would cause ENCLS[ELDU] to #GP If ENCLAVE_B is evicted first then a new PCMD_AB would be allocated by the reclaimer but later when ENCLAVE_A is faulted the ENCLS[ELDU] instruction would #GP during its checks of the PCMD value and the WARN would be encountered. Noting that the reclaimer sets SGX_ENCL_PAGE_BEING_RECLAIMED at the time it obtains a reference to the backing store pages of an enclave page it is in the process of reclaiming, fix the race by only truncating the PCMD page after ensuring that no page sharing the PCMD page is in the process of being reclaimed. Reported-by: Haitao Huang <haitao.huang@xxxxxxxxx> Signed-off-by: Reinette Chatre <reinette.chatre@xxxxxxxxx> > >>> Also, given the challenges here, I do think we should check the >>> pcmd_page after truncate to ensure it is still all zero's. >> >> Could you please elaborate the challenges? I do not think it would >> help the issue at hand at all to add a check since this is done with >> the mutex held while the reclaimer has a reference to the page but no >> mutex ... as long as the page fault handler has that mutex it can write >> and check the PCMD page all it want, the new changes would occur to the >> PCMD page when it (the page fault handler) releases the mutex and the >> reclaimer attempts to use the page it has a reference to. > > I'm just asking for a debugging check that looks at the page *after* is > has been truncated to ensure it is zero, in case another bug creeps in here. ok, I can add your patch from https://lore.kernel.org/linux-sgx/faaaf70d-22b7-eaa5-4623-bbdf1012827a@xxxxxxxxx/ to this series. Reinette