On Mon Apr 29, 2024 at 1:43 PM EEST, Dmitrii Kuvaiskii wrote: > Two enclave threads may try to add and remove the same enclave page > simultaneously (e.g., if the SGX runtime supports both lazy allocation > and `MADV_DONTNEED` semantics). Consider this race: > > 1. T1 performs page removal in sgx_encl_remove_pages() and stops right > after removing the page table entry and right before re-acquiring the > enclave lock to EREMOVE and xa_erase(&encl->page_array) the page. > 2. T2 tries to access the page, and #PF[not_present] is raised. The > condition to EAUG in sgx_vma_fault() is not satisfied because the > page is still present in encl->page_array, thus the SGX driver > assumes that the fault happened because the page was swapped out. The > driver continues on a code path that installs a page table entry > *without* performing EAUG. > 3. The enclave page metadata is in inconsistent state: the PTE is > installed but there was no EAUG. Thus, T2 in userspace infinitely > receives SIGSEGV on this page (and EACCEPT always fails). > > Fix this by making sure that T1 (the page-removing thread) always wins > this data race. In particular, the page-being-removed is marked as such, > and T2 retries until the page is fully removed. > > Fixes: 9849bb27152c ("x86/sgx: Support complete page removal") > Cc: stable@xxxxxxxxxxxxxxx > Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@xxxxxxxxx> > --- > arch/x86/kernel/cpu/sgx/encl.c | 3 ++- > arch/x86/kernel/cpu/sgx/encl.h | 3 +++ > arch/x86/kernel/cpu/sgx/ioctl.c | 1 + > 3 files changed, 6 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c > index 41f14b1a3025..7ccd8b2fce5f 100644 > --- a/arch/x86/kernel/cpu/sgx/encl.c > +++ b/arch/x86/kernel/cpu/sgx/encl.c > @@ -257,7 +257,8 @@ static struct sgx_encl_page *__sgx_encl_load_page(struct sgx_encl *encl, > > /* Entry successfully located. */ > if (entry->epc_page) { > - if (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED) > + if (entry->desc & (SGX_ENCL_PAGE_BEING_RECLAIMED | > + SGX_ENCL_PAGE_BEING_REMOVED)) > return ERR_PTR(-EBUSY); > > return entry; > diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h > index f94ff14c9486..fff5f2293ae7 100644 > --- a/arch/x86/kernel/cpu/sgx/encl.h > +++ b/arch/x86/kernel/cpu/sgx/encl.h > @@ -25,6 +25,9 @@ > /* 'desc' bit marking that the page is being reclaimed. */ > #define SGX_ENCL_PAGE_BEING_RECLAIMED BIT(3) > > +/* 'desc' bit marking that the page is being removed. */ > +#define SGX_ENCL_PAGE_BEING_REMOVED BIT(2) > + > struct sgx_encl_page { > unsigned long desc; > unsigned long vm_max_prot_bits:8; > diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c > index b65ab214bdf5..c542d4dd3e64 100644 > --- a/arch/x86/kernel/cpu/sgx/ioctl.c > +++ b/arch/x86/kernel/cpu/sgx/ioctl.c > @@ -1142,6 +1142,7 @@ static long sgx_encl_remove_pages(struct sgx_encl *encl, > * Do not keep encl->lock because of dependency on > * mmap_lock acquired in sgx_zap_enclave_ptes(). > */ > + entry->desc |= SGX_ENCL_PAGE_BEING_REMOVED; > mutex_unlock(&encl->lock); > > sgx_zap_enclave_ptes(encl, addr); It is somewhat trivial to NAK this as the commit message does not do any effort describing the new flag. By default at least I have strong opposition against any new flags related to reclaiming even if it needs a bit of extra synchronization work in the user space. One way to describe concurrency scenarios would be to take example from https://www.kernel.org/doc/Documentation/memory-barriers.txt I.e. see the examples with CPU 1 and CPU 2. BR, Jarkko