Re: [PATCH v4 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 26, 2021 at 09:48:48PM +0200, Jarkko Sakkinen wrote:
> On Thu, Mar 25, 2021 at 10:30:57PM +1300, Kai Huang wrote:
> > EREMOVE takes a page and removes any association between that page and
> > an enclave.  It must be run on a page before it can be added into
> > another enclave.  Currently, EREMOVE is run as part of pages being freed
> > into the SGX page allocator.  It is not expected to fail, as it would
> > indicate a use-after-free of EPC.  Rather than add the page back to the
> > pool of available EPC, the kernel intentionally leaks the page to avoid
> > additional errors in the future.
> > 
> > However, KVM does not track how guest pages are used, which means that
> > SGX virtualization use of EREMOVE might fail.  Specifically, it is
> > legitimate that EREMOVE returns SGX_CHILD_PRESENT for EPC assigned to
> > KVM guest, because KVM/kernel doesn't track SECS pages.
> > 
> > To allow SGX/KVM to introduce a more permissive EREMOVE helper and to
> > let the SGX virtualization code use the allocator directly, break out
> > the EREMOVE call from the SGX page allocator.  Rename the original
> > sgx_free_epc_page() to sgx_encl_free_epc_page(), indicating that it is
> > used to free EPC page assigned host enclave. Replace sgx_free_epc_page()
> > with sgx_encl_free_epc_page() in all call sites so there's no functional
> > change.
> > 
> > At the same time improve error message when EREMOVE fails, and add
> > documentation to explain to user what is the bug and suggest user what
> > to do when this bug happens, although extremely unlikely.
> > 
> > Signed-off-by: Kai Huang <kai.huang@xxxxxxxxx>
> > ---
> >  Documentation/x86/sgx.rst       | 27 +++++++++++++++++++++++++++
> >  arch/x86/kernel/cpu/sgx/encl.c  | 32 +++++++++++++++++++++++++++-----
> >  arch/x86/kernel/cpu/sgx/encl.h  |  1 +
> >  arch/x86/kernel/cpu/sgx/ioctl.c |  6 +++---
> >  arch/x86/kernel/cpu/sgx/main.c  | 14 +++++---------
> >  arch/x86/kernel/cpu/sgx/sgx.h   |  5 +++++
> >  6 files changed, 68 insertions(+), 17 deletions(-)
> > 
> > diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
> > index eaee1368b4fd..5ec7d17e65e0 100644
> > --- a/Documentation/x86/sgx.rst
> > +++ b/Documentation/x86/sgx.rst
> > @@ -209,3 +209,30 @@ An application may be loaded into a container enclave which is specially
> >  configured with a library OS and run-time which permits the application to run.
> >  The enclave run-time and library OS work together to execute the application
> >  when a thread enters the enclave.
> > +
> > +Impact of Potential Kernel SGX Bugs
> > +===================================
> > +
> > +EPC leaks
> > +---------
> > +
> > +EPC leaks can happen if kernel SGX bug happens, when a WARNING with below
> > +message is shown in dmesg:
> > +
> > +"...EREMOVE returned ... and an EPC page was leaked.  SGX may become unusuable.
> > +This is likely a kernel bug.  Refer to Documentation/x86/sgx.rst for more
> > +information."
> > +
> > +This is effectively a kernel use-after-free of EPC, and due to the way SGX
> > +works, the bug is detected at freeing. Rather than add the page back to the pool
> > +of available EPC, the kernel intentionally leaks the page to avoid additional
> > +errors in the future.
> > +
> > +When this happens, kernel will likely soon leak majority of EPC pages, and SGX
> > +will likely become unusable. However while this may be fatal to SGX, other
> > +kernel functionalities are unlikely to be impacted, and should continue to work.
> > +
> > +As a result, when this happpens, user should stop running any new SGX workloads,
> > +(or just any new workloads), and migrate all valuable workloads. Although a
> > +machine reboot can recover all EPC, the bug should be reported to Linux
> > +developers.
> > diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> > index 7449ef33f081..26c0987153de 100644
> > --- a/arch/x86/kernel/cpu/sgx/encl.c
> > +++ b/arch/x86/kernel/cpu/sgx/encl.c
> > @@ -78,7 +78,7 @@ static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page,
> >  
> >  	ret = __sgx_encl_eldu(encl_page, epc_page, secs_page);
> >  	if (ret) {
> > -		sgx_free_epc_page(epc_page);
> > +		sgx_encl_free_epc_page(epc_page);
> >  		return ERR_PTR(ret);
> >  	}
> >  
> > @@ -404,7 +404,7 @@ void sgx_encl_release(struct kref *ref)
> >  			if (sgx_unmark_page_reclaimable(entry->epc_page))
> >  				continue;
> >  
> > -			sgx_free_epc_page(entry->epc_page);
> > +			sgx_encl_free_epc_page(entry->epc_page);
> >  			encl->secs_child_cnt--;
> >  			entry->epc_page = NULL;
> >  		}
> > @@ -415,7 +415,7 @@ void sgx_encl_release(struct kref *ref)
> >  	xa_destroy(&encl->page_array);
> >  
> >  	if (!encl->secs_child_cnt && encl->secs.epc_page) {
> > -		sgx_free_epc_page(encl->secs.epc_page);
> > +		sgx_encl_free_epc_page(encl->secs.epc_page);
> >  		encl->secs.epc_page = NULL;
> >  	}
> >  
> > @@ -423,7 +423,7 @@ void sgx_encl_release(struct kref *ref)
> >  		va_page = list_first_entry(&encl->va_pages, struct sgx_va_page,
> >  					   list);
> >  		list_del(&va_page->list);
> > -		sgx_free_epc_page(va_page->epc_page);
> > +		sgx_encl_free_epc_page(va_page->epc_page);
> >  		kfree(va_page);
> >  	}
> >  
> > @@ -686,7 +686,7 @@ struct sgx_epc_page *sgx_alloc_va_page(void)
> >  	ret = __epa(sgx_get_epc_virt_addr(epc_page));
> >  	if (ret) {
> >  		WARN_ONCE(1, "EPA returned %d (0x%x)", ret, ret);
> > -		sgx_free_epc_page(epc_page);
> > +		sgx_encl_free_epc_page(epc_page);
> >  		return ERR_PTR(-EFAULT);
> >  	}
> >  
> > @@ -735,3 +735,25 @@ bool sgx_va_page_full(struct sgx_va_page *va_page)
> >  
> >  	return slot == SGX_VA_SLOT_COUNT;
> >  }
> > +
> > +/**
> > + * sgx_encl_free_epc_page - free EPC page assigned to an enclave
> > + * @page:	EPC page to be freed
> > + *
> > + * Free EPC page assigned to an enclave.  It does EREMOVE for the page, and
> > + * only upon success, it puts the page back to free page list.  Otherwise, it
> > + * gives a WARNING to indicate page is leaked, and require reboot to retrieve
> > + * leaked pages.
> > + */
> > +void sgx_encl_free_epc_page(struct sgx_epc_page *page)
> > +{
> > +	int ret;
> > +
> > +	WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED);
> > +
> > +	ret = __eremove(sgx_get_epc_virt_addr(page));
> > +	if (WARN_ONCE(ret, EREMOVE_ERROR_MESSAGE, ret, ret))
> > +		return;
> > +
> > +	sgx_free_epc_page(page);
> > +}
> > diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
> > index d8d30ccbef4c..6e74f85b6264 100644
> > --- a/arch/x86/kernel/cpu/sgx/encl.h
> > +++ b/arch/x86/kernel/cpu/sgx/encl.h
> > @@ -115,5 +115,6 @@ struct sgx_epc_page *sgx_alloc_va_page(void);
> >  unsigned int sgx_alloc_va_slot(struct sgx_va_page *va_page);
> >  void sgx_free_va_slot(struct sgx_va_page *va_page, unsigned int offset);
> >  bool sgx_va_page_full(struct sgx_va_page *va_page);
> > +void sgx_encl_free_epc_page(struct sgx_epc_page *page);
> >  
> >  #endif /* _X86_ENCL_H */
> > diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
> > index 90a5caf76939..772b9c648cf1 100644
> > --- a/arch/x86/kernel/cpu/sgx/ioctl.c
> > +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
> > @@ -47,7 +47,7 @@ static void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page)
> >  	encl->page_cnt--;
> >  
> >  	if (va_page) {
> > -		sgx_free_epc_page(va_page->epc_page);
> > +		sgx_encl_free_epc_page(va_page->epc_page);
> >  		list_del(&va_page->list);
> >  		kfree(va_page);
> >  	}
> > @@ -117,7 +117,7 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs)
> >  	return 0;
> >  
> >  err_out:
> > -	sgx_free_epc_page(encl->secs.epc_page);
> > +	sgx_encl_free_epc_page(encl->secs.epc_page);
> >  	encl->secs.epc_page = NULL;
> >  
> >  err_out_backing:
> > @@ -365,7 +365,7 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long src,
> >  	mmap_read_unlock(current->mm);
> >  
> >  err_out_free:
> > -	sgx_free_epc_page(epc_page);
> > +	sgx_encl_free_epc_page(epc_page);
> >  	kfree(encl_page);
> >  
> >  	return ret;
> > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> > index 13a7599ce7d4..b227629b1e9c 100644
> > --- a/arch/x86/kernel/cpu/sgx/main.c
> > +++ b/arch/x86/kernel/cpu/sgx/main.c
> > @@ -294,7 +294,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
> >  
> >  		sgx_encl_ewb(encl->secs.epc_page, &secs_backing);
> >  
> > -		sgx_free_epc_page(encl->secs.epc_page);
> > +		sgx_encl_free_epc_page(encl->secs.epc_page);
> >  		encl->secs.epc_page = NULL;
> >  
> >  		sgx_encl_put_backing(&secs_backing, true);
> > @@ -609,19 +609,15 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
> >   * sgx_free_epc_page() - Free an EPC page
> >   * @page:	an EPC page
> >   *
> > - * Call EREMOVE for an EPC page and insert it back to the list of free pages.
> > + * Put the EPC page back to the list of free pages. It's the caller's
> > + * responsibility to make sure that the page is in uninitialized state. In other
> > + * words, do EREMOVE, EWB or whatever operation is necessary before calling
> > + * this function.
> >   */
> >  void sgx_free_epc_page(struct sgx_epc_page *page)
> >  {
> >  	struct sgx_epc_section *section = &sgx_epc_sections[page->section];
> >  	struct sgx_numa_node *node = section->node;
> > -	int ret;
> > -
> > -	WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED);
> > -
> > -	ret = __eremove(sgx_get_epc_virt_addr(page));
> > -	if (WARN_ONCE(ret, "EREMOVE returned %d (0x%x)", ret, ret))
> > -		return;
> >  
> >  	spin_lock(&node->lock);
> >  
> > diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
> > index 653af8ca1a25..6b21a165500e 100644
> > --- a/arch/x86/kernel/cpu/sgx/sgx.h
> > +++ b/arch/x86/kernel/cpu/sgx/sgx.h
> > @@ -13,6 +13,11 @@
> >  #undef pr_fmt
> >  #define pr_fmt(fmt) "sgx: " fmt
> >  
> > +/* Error message for EREMOVE failure, when kernel is about to leak EPC page */
> > +#define EREMOVE_ERROR_MESSAGE \
> > +	"EREMOVE returned %d (0x%x) and an EPC page was leaked.  SGX may become unusuable.  " \
> > +	"This is likely a kernel bug.  Refer to Documentation/x86/sgx.rst for more information."
> 
> 
> Why this needs to be here and not open coded where it is used?
> 
> > +
> >  #define SGX_MAX_EPC_SECTIONS		8
> >  #define SGX_EEXTEND_BLOCK_SIZE		256
> >  #define SGX_NR_TO_SCAN			16
> > -- 
> > 2.30.2
> > 
> > 

Anyway,

Reviewed-by: Jarkko Sakkinen <jarkko@xxxxxxxxxx>

/Jarkko



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux