Patch "x86/sgx: Reduce delay and interference of enclave release" has been added to the 6.1-stable tree

Sasha Levin <sashal@xxxxxxxxxx> · Sun, 25 Dec 2022 08:08:49 -0500

This is a note to let you know that I've just added the patch titled

    x86/sgx: Reduce delay and interference of enclave release

to the 6.1-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     x86-sgx-reduce-delay-and-interference-of-enclave-rel.patch
and it can be found in the queue-6.1 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit abe22acafe1270afacd9574e98a4776dbb1874db
Author: Reinette Chatre <reinette.chatre@xxxxxxxxx>
Date:   Mon Oct 31 10:29:58 2022 -0700

    x86/sgx: Reduce delay and interference of enclave release
    
    [ Upstream commit 7b72c823ddf8aaaec4e9fb28e6fbe4d511e7dad1 ]
    
    commit 8795359e35bc ("x86/sgx: Silence softlockup detection when
    releasing large enclaves") introduced a cond_resched() during enclave
    release where the EREMOVE instruction is applied to every 4k enclave
    page. Giving other tasks an opportunity to run while tearing down a
    large enclave placates the soft lockup detector but Iqbal found
    that the fix causes a 25% performance degradation of a workload
    run using Gramine.
    
    Gramine maintains a 1:1 mapping between processes and SGX enclaves.
    That means if a workload in an enclave creates a subprocess then
    Gramine creates a duplicate enclave for that subprocess to run in.
    The consequence is that the release of the enclave used to run
    the subprocess can impact the performance of the workload that is
    run in the original enclave, especially in large enclaves when
    SGX2 is not in use.
    
    The workload run by Iqbal behaves as follows:
    Create enclave (enclave "A")
    /* Initialize workload in enclave "A" */
    Create enclave (enclave "B")
    /* Run subprocess in enclave "B" and send result to enclave "A" */
    Release enclave (enclave "B")
    /* Run workload in enclave "A" */
    Release enclave (enclave "A")
    
    The performance impact of releasing enclave "B" in the above scenario
    is amplified when there is a lot of SGX memory and the enclave size
    matches the SGX memory. When there is 128GB SGX memory and an enclave
    size of 128GB, from the time enclave "B" starts the 128GB SGX memory
    is oversubscribed with a combined demand for 256GB from the two
    enclaves.
    
    Before commit 8795359e35bc ("x86/sgx: Silence softlockup detection when
    releasing large enclaves") enclave release was done in a tight loop
    without giving other tasks a chance to run. Even though the system
    experienced soft lockups the workload (run in enclave "A") obtained
    good performance numbers because when the workload started running
    there was no interference.
    
    Commit 8795359e35bc ("x86/sgx: Silence softlockup detection when
    releasing large enclaves") gave other tasks opportunity to run while an
    enclave is released. The impact of this in this scenario is that while
    enclave "B" is released and needing to access each page that belongs
    to it in order to run the SGX EREMOVE instruction on it, enclave "A"
    is attempting to run the workload needing to access the enclave
    pages that belong to it. This causes a lot of swapping due to the
    demand for the oversubscribed SGX memory. Longer latencies are
    experienced by the workload in enclave "A" while enclave "B" is
    released.
    
    Improve the performance of enclave release while still avoiding the
    soft lockup detector with two enhancements:
    - Only call cond_resched() after XA_CHECK_SCHED iterations.
    - Use the xarray advanced API to keep the xarray locked for
      XA_CHECK_SCHED iterations instead of locking and unlocking
      at every iteration.
    
    This batching solution is copied from sgx_encl_may_map() that
    also iterates through all enclave pages using this technique.
    
    With this enhancement the workload experiences a 5%
    performance degradation when compared to a kernel without
    commit 8795359e35bc ("x86/sgx: Silence softlockup detection when
    releasing large enclaves"), an improvement to the reported 25%
    degradation, while still placating the soft lockup detector.
    
    Scenarios with poor performance are still possible even with these
    enhancements. For example, short workloads creating sub processes
    while running in large enclaves. Further performance improvements
    are pursued in user space through avoiding to create duplicate enclaves
    for certain sub processes, and using SGX2 that will do lazy allocation
    of pages as needed so enclaves created for sub processes start quickly
    and release quickly.
    
    Fixes: 8795359e35bc ("x86/sgx: Silence softlockup detection when releasing large enclaves")
    Reported-by: Md Iqbal Hossain <md.iqbal.hossain@xxxxxxxxx>
    Signed-off-by: Reinette Chatre <reinette.chatre@xxxxxxxxx>
    Signed-off-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
    Tested-by: Md Iqbal Hossain <md.iqbal.hossain@xxxxxxxxx>
    Link: https://lore.kernel.org/all/00efa80dd9e35dc85753e1c5edb0344ac07bb1f0.1667236485.git.reinette.chatre%40intel.com
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 1ec20807de1e..2c258255a629 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -680,11 +680,15 @@ const struct vm_operations_struct sgx_vm_ops = {
 void sgx_encl_release(struct kref *ref)
 {
 	struct sgx_encl *encl = container_of(ref, struct sgx_encl, refcount);
+	unsigned long max_page_index = PFN_DOWN(encl->base + encl->size - 1);
 	struct sgx_va_page *va_page;
 	struct sgx_encl_page *entry;
-	unsigned long index;
+	unsigned long count = 0;
+
+	XA_STATE(xas, &encl->page_array, PFN_DOWN(encl->base));
 
-	xa_for_each(&encl->page_array, index, entry) {
+	xas_lock(&xas);
+	xas_for_each(&xas, entry, max_page_index) {
 		if (entry->epc_page) {
 			/*
 			 * The page and its radix tree entry cannot be freed
@@ -699,9 +703,20 @@ void sgx_encl_release(struct kref *ref)
 		}
 
 		kfree(entry);
-		/* Invoke scheduler to prevent soft lockups. */
-		cond_resched();
+		/*
+		 * Invoke scheduler on every XA_CHECK_SCHED iteration
+		 * to prevent soft lockups.
+		 */
+		if (!(++count % XA_CHECK_SCHED)) {
+			xas_pause(&xas);
+			xas_unlock(&xas);
+
+			cond_resched();
+
+			xas_lock(&xas);
+		}
 	}
+	xas_unlock(&xas);
 
 	xa_destroy(&encl->page_array);