On Fri, Mar 26, 2021 at 09:48:48PM +0200, Jarkko Sakkinen wrote: > On Thu, Mar 25, 2021 at 10:30:57PM +1300, Kai Huang wrote: > > EREMOVE takes a page and removes any association between that page and > > an enclave. It must be run on a page before it can be added into > > another enclave. Currently, EREMOVE is run as part of pages being freed > > into the SGX page allocator. It is not expected to fail, as it would > > indicate a use-after-free of EPC. Rather than add the page back to the > > pool of available EPC, the kernel intentionally leaks the page to avoid > > additional errors in the future. > > > > However, KVM does not track how guest pages are used, which means that > > SGX virtualization use of EREMOVE might fail. Specifically, it is > > legitimate that EREMOVE returns SGX_CHILD_PRESENT for EPC assigned to > > KVM guest, because KVM/kernel doesn't track SECS pages. > > > > To allow SGX/KVM to introduce a more permissive EREMOVE helper and to > > let the SGX virtualization code use the allocator directly, break out > > the EREMOVE call from the SGX page allocator. Rename the original > > sgx_free_epc_page() to sgx_encl_free_epc_page(), indicating that it is > > used to free EPC page assigned host enclave. Replace sgx_free_epc_page() > > with sgx_encl_free_epc_page() in all call sites so there's no functional > > change. > > > > At the same time improve error message when EREMOVE fails, and add > > documentation to explain to user what is the bug and suggest user what > > to do when this bug happens, although extremely unlikely. > > > > Signed-off-by: Kai Huang <kai.huang@xxxxxxxxx> > > --- > > Documentation/x86/sgx.rst | 27 +++++++++++++++++++++++++++ > > arch/x86/kernel/cpu/sgx/encl.c | 32 +++++++++++++++++++++++++++----- > > arch/x86/kernel/cpu/sgx/encl.h | 1 + > > arch/x86/kernel/cpu/sgx/ioctl.c | 6 +++--- > > arch/x86/kernel/cpu/sgx/main.c | 14 +++++--------- > > arch/x86/kernel/cpu/sgx/sgx.h | 5 +++++ > > 6 files changed, 68 insertions(+), 17 deletions(-) > > > > diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst > > index eaee1368b4fd..5ec7d17e65e0 100644 > > --- a/Documentation/x86/sgx.rst > > +++ b/Documentation/x86/sgx.rst > > @@ -209,3 +209,30 @@ An application may be loaded into a container enclave which is specially > > configured with a library OS and run-time which permits the application to run. > > The enclave run-time and library OS work together to execute the application > > when a thread enters the enclave. > > + > > +Impact of Potential Kernel SGX Bugs > > +=================================== > > + > > +EPC leaks > > +--------- > > + > > +EPC leaks can happen if kernel SGX bug happens, when a WARNING with below > > +message is shown in dmesg: > > + > > +"...EREMOVE returned ... and an EPC page was leaked. SGX may become unusuable. > > +This is likely a kernel bug. Refer to Documentation/x86/sgx.rst for more > > +information." > > + > > +This is effectively a kernel use-after-free of EPC, and due to the way SGX > > +works, the bug is detected at freeing. Rather than add the page back to the pool > > +of available EPC, the kernel intentionally leaks the page to avoid additional > > +errors in the future. > > + > > +When this happens, kernel will likely soon leak majority of EPC pages, and SGX > > +will likely become unusable. However while this may be fatal to SGX, other > > +kernel functionalities are unlikely to be impacted, and should continue to work. > > + > > +As a result, when this happpens, user should stop running any new SGX workloads, > > +(or just any new workloads), and migrate all valuable workloads. Although a > > +machine reboot can recover all EPC, the bug should be reported to Linux > > +developers. > > diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c > > index 7449ef33f081..26c0987153de 100644 > > --- a/arch/x86/kernel/cpu/sgx/encl.c > > +++ b/arch/x86/kernel/cpu/sgx/encl.c > > @@ -78,7 +78,7 @@ static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page, > > > > ret = __sgx_encl_eldu(encl_page, epc_page, secs_page); > > if (ret) { > > - sgx_free_epc_page(epc_page); > > + sgx_encl_free_epc_page(epc_page); > > return ERR_PTR(ret); > > } > > > > @@ -404,7 +404,7 @@ void sgx_encl_release(struct kref *ref) > > if (sgx_unmark_page_reclaimable(entry->epc_page)) > > continue; > > > > - sgx_free_epc_page(entry->epc_page); > > + sgx_encl_free_epc_page(entry->epc_page); > > encl->secs_child_cnt--; > > entry->epc_page = NULL; > > } > > @@ -415,7 +415,7 @@ void sgx_encl_release(struct kref *ref) > > xa_destroy(&encl->page_array); > > > > if (!encl->secs_child_cnt && encl->secs.epc_page) { > > - sgx_free_epc_page(encl->secs.epc_page); > > + sgx_encl_free_epc_page(encl->secs.epc_page); > > encl->secs.epc_page = NULL; > > } > > > > @@ -423,7 +423,7 @@ void sgx_encl_release(struct kref *ref) > > va_page = list_first_entry(&encl->va_pages, struct sgx_va_page, > > list); > > list_del(&va_page->list); > > - sgx_free_epc_page(va_page->epc_page); > > + sgx_encl_free_epc_page(va_page->epc_page); > > kfree(va_page); > > } > > > > @@ -686,7 +686,7 @@ struct sgx_epc_page *sgx_alloc_va_page(void) > > ret = __epa(sgx_get_epc_virt_addr(epc_page)); > > if (ret) { > > WARN_ONCE(1, "EPA returned %d (0x%x)", ret, ret); > > - sgx_free_epc_page(epc_page); > > + sgx_encl_free_epc_page(epc_page); > > return ERR_PTR(-EFAULT); > > } > > > > @@ -735,3 +735,25 @@ bool sgx_va_page_full(struct sgx_va_page *va_page) > > > > return slot == SGX_VA_SLOT_COUNT; > > } > > + > > +/** > > + * sgx_encl_free_epc_page - free EPC page assigned to an enclave > > + * @page: EPC page to be freed > > + * > > + * Free EPC page assigned to an enclave. It does EREMOVE for the page, and > > + * only upon success, it puts the page back to free page list. Otherwise, it > > + * gives a WARNING to indicate page is leaked, and require reboot to retrieve > > + * leaked pages. > > + */ > > +void sgx_encl_free_epc_page(struct sgx_epc_page *page) > > +{ > > + int ret; > > + > > + WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED); > > + > > + ret = __eremove(sgx_get_epc_virt_addr(page)); > > + if (WARN_ONCE(ret, EREMOVE_ERROR_MESSAGE, ret, ret)) > > + return; > > + > > + sgx_free_epc_page(page); > > +} > > diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h > > index d8d30ccbef4c..6e74f85b6264 100644 > > --- a/arch/x86/kernel/cpu/sgx/encl.h > > +++ b/arch/x86/kernel/cpu/sgx/encl.h > > @@ -115,5 +115,6 @@ struct sgx_epc_page *sgx_alloc_va_page(void); > > unsigned int sgx_alloc_va_slot(struct sgx_va_page *va_page); > > void sgx_free_va_slot(struct sgx_va_page *va_page, unsigned int offset); > > bool sgx_va_page_full(struct sgx_va_page *va_page); > > +void sgx_encl_free_epc_page(struct sgx_epc_page *page); > > > > #endif /* _X86_ENCL_H */ > > diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c > > index 90a5caf76939..772b9c648cf1 100644 > > --- a/arch/x86/kernel/cpu/sgx/ioctl.c > > +++ b/arch/x86/kernel/cpu/sgx/ioctl.c > > @@ -47,7 +47,7 @@ static void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page) > > encl->page_cnt--; > > > > if (va_page) { > > - sgx_free_epc_page(va_page->epc_page); > > + sgx_encl_free_epc_page(va_page->epc_page); > > list_del(&va_page->list); > > kfree(va_page); > > } > > @@ -117,7 +117,7 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs) > > return 0; > > > > err_out: > > - sgx_free_epc_page(encl->secs.epc_page); > > + sgx_encl_free_epc_page(encl->secs.epc_page); > > encl->secs.epc_page = NULL; > > > > err_out_backing: > > @@ -365,7 +365,7 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long src, > > mmap_read_unlock(current->mm); > > > > err_out_free: > > - sgx_free_epc_page(epc_page); > > + sgx_encl_free_epc_page(epc_page); > > kfree(encl_page); > > > > return ret; > > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c > > index 13a7599ce7d4..b227629b1e9c 100644 > > --- a/arch/x86/kernel/cpu/sgx/main.c > > +++ b/arch/x86/kernel/cpu/sgx/main.c > > @@ -294,7 +294,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, > > > > sgx_encl_ewb(encl->secs.epc_page, &secs_backing); > > > > - sgx_free_epc_page(encl->secs.epc_page); > > + sgx_encl_free_epc_page(encl->secs.epc_page); > > encl->secs.epc_page = NULL; > > > > sgx_encl_put_backing(&secs_backing, true); > > @@ -609,19 +609,15 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) > > * sgx_free_epc_page() - Free an EPC page > > * @page: an EPC page > > * > > - * Call EREMOVE for an EPC page and insert it back to the list of free pages. > > + * Put the EPC page back to the list of free pages. It's the caller's > > + * responsibility to make sure that the page is in uninitialized state. In other > > + * words, do EREMOVE, EWB or whatever operation is necessary before calling > > + * this function. > > */ > > void sgx_free_epc_page(struct sgx_epc_page *page) > > { > > struct sgx_epc_section *section = &sgx_epc_sections[page->section]; > > struct sgx_numa_node *node = section->node; > > - int ret; > > - > > - WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED); > > - > > - ret = __eremove(sgx_get_epc_virt_addr(page)); > > - if (WARN_ONCE(ret, "EREMOVE returned %d (0x%x)", ret, ret)) > > - return; > > > > spin_lock(&node->lock); > > > > diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h > > index 653af8ca1a25..6b21a165500e 100644 > > --- a/arch/x86/kernel/cpu/sgx/sgx.h > > +++ b/arch/x86/kernel/cpu/sgx/sgx.h > > @@ -13,6 +13,11 @@ > > #undef pr_fmt > > #define pr_fmt(fmt) "sgx: " fmt > > > > +/* Error message for EREMOVE failure, when kernel is about to leak EPC page */ > > +#define EREMOVE_ERROR_MESSAGE \ > > + "EREMOVE returned %d (0x%x) and an EPC page was leaked. SGX may become unusuable. " \ > > + "This is likely a kernel bug. Refer to Documentation/x86/sgx.rst for more information." > > > Why this needs to be here and not open coded where it is used? > > > + > > #define SGX_MAX_EPC_SECTIONS 8 > > #define SGX_EEXTEND_BLOCK_SIZE 256 > > #define SGX_NR_TO_SCAN 16 > > -- > > 2.30.2 > > > > Anyway, Reviewed-by: Jarkko Sakkinen <jarkko@xxxxxxxxxx> /Jarkko