Re: [RFC PATCH 0/4] SGX shmem backing store issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dave,

On 5/2/2022 2:33 PM, Dave Hansen wrote:
> On 5/2/22 10:11, Reinette Chatre wrote:
>> My goal was to prevent the page fault handler from doing a "is the PCMD
>> page empty" if the reclaimer has a reference to it. Even if the PCMD page
>> is empty when the page fault handler checks it the page is expected to
>> get data right when reclaimer can get the mutex back since the reclaimer
>> already has a reference to the page.
> 
> I think shmem_truncate_range() might be the wrong operation.  It
> destroys data and, in the end, we don't want to destroy data.
> Filesystems and the page cache already have nice ways to keep from
> destroying data, we just need to use them.
> 
> First, I think set_page_dirty(pcmd_page) before the EWB is a good start.
>  That's what filesystems do before important data that needs to be saved
> goes into pages.

ok, at this time the reclaimer sets the dirty bit _after_ writing to the
shmem backing store. 

>From what I understand we need to split sgx_encl_put_backing() and remove
the part where dirty bit is set and only use sgx_encl_put_backing()
to do the put_page() calls.

The motivation for this change is not exactly clear to me since in
the implementation the access to these pages are protected by the
enclave mutex.

> 
> Second, I think we need behavior like POSIX_FADV_DONTNEED, not
> FALLOC_FL_PUNCH_HOLE.  The DONTNEED operations end up in
> mapping_evict_folio(), which has both page refcount *and* dirty page
> checks.  That means that either elevating a refcount or set_page_dirty()
> will thwart DONTNEED-like behavior.
> 
> There are two basic things we need to do:
> 
> 1. Prevent page from being truncated around EWB time
> 2. Prevent unreferenced page with all zeros from staying in shmem
>    forever or taking up swap space
> 
> On the EWB (write to PCMD side) I think something like this works:
> 
> 	sgx_encl_get_backing()
> 		get_page(pcmd_page)
> 
> 	...
> 	lock_page(pcmd_page);
> 	// check for truncation since sgx_encl_get_backing()
> 	if (pcmd_page->mapping != shmem)
> 		goto retry;
> 	 // double check this is OK under lock_page():
> 	set_page_dirty(pcmd_page);
> 	__sgx_encl_ewb();
> 	unlock_page(pcmd_page);
> 
> That's basically what filesystems do.  Get the page from the page cache,
> lock it, then make sure it is consistent.  If not, retry.
> 
> On the "free" / read in (ELDU) side:
> 
> 	// get pcmd_page ref
> 	lock_page(pcmd_page);
> 	// truncation is not a concern because that's only done
> 	// on the read-in side, here, where we hold encl->lock
> 
> 	memset();
> 	if (!memchr_inv())
> 		// clear the way for DONTNEED:
> 		ClearPageDirty(pcmd_page);
> 	unlock_page(pcmd_page);
> 	// drop pcmd_page ref
> 	...
> 	POSIX_FADV_DONTNEED
> 
> There's one downside to this: it's _possible_ that an transient
> get_page() would block POSIX_FADV_DONTNEED.  Then the zeroed page would
> stick around forever, or at least until the next ELDU operation did
> another memchr_inv().
> 
> I went looking around for some of those and could not find any that I
> *know* apply to shmem.
> 
> This doesn't feel like a great solution; it's more complicated than I
> would like.  Any other ideas?

I am not familiar with the memory management and found the above a bit
intimidating. Noting that you are open to other ideas I 
attempted to contain the fix using existing SGX state. Note that the
reclaimer sets SGX_ENCL_PAGE_BEING_RECLAIMED at the time it obtains a
reference to the backing store pages of an enclave page it is in the
process of reclaiming. The fix only truncates the PCMD page
after ensuring that no page sharing the PCMD page is in the process of being
reclaimed.

I confirmed with this fix that PCMD pages are indeed being truncated -
but only if they are empty _and_ no page sharing the PCMD page is in
the process of being truncated.

With this fix the error printed by patch 4/4 are no longer encountered
and that patch is thus no longer required.

Temporarily I added the line of debugging that you can find in the patch
below and this line is printed while running the stress test. Here is a
snippet from the log file to give an idea how frequently this scenario
is encountered while running the stress test:

[  368.277362] sgx: pcmd_page_in_use:49 encl ffff9e76d8660000 addr 0x7f2545148000 desc 0x7f2545148008 (index 282952) being reclaimed
[  368.277400] sgx: pcmd_page_in_use:49 encl ffff9e76d8660000 addr 0x7f2545149000 desc 0x7f2545149008 (index 282953) being reclaimed
[  655.620465] sgx: pcmd_page_in_use:49 encl ffff9e774ce50000 addr 0x7fbe340f9000 desc 0x7fbe340f9008 (index 213241) being reclaimed
[  655.620520] sgx: pcmd_page_in_use:49 encl ffff9e774ce50000 addr 0x7fbe340fa000 desc 0x7fbe340fa008 (index 213242) being reclaimed
[  717.514992] sgx: pcmd_page_in_use:49 encl ffff9e780cd60000 addr 0x7f312c8cc000 desc 0x7f312c8cc008 (index 182476) being reclaimed
[  911.387007] sgx: pcmd_page_in_use:49 encl ffff9e77a6770000 addr 0x7f252fa2d000 desc 0x7f252fa2d008 (index 195117) being reclaimed
[  946.991400] sgx: pcmd_page_in_use:49 encl ffff9e76dd1a0000 addr 0x7fc210784000 desc 0x7fc210784008 (index 67460) being reclaimed
[  946.991479] sgx: pcmd_page_in_use:49 encl ffff9e76dd1a0000 addr 0x7fc210788000 desc 0x7fc210788008 (index 67464) being reclaimed
[ 1157.210449] sgx: pcmd_page_in_use:49 encl ffff9e76dc550000 addr 0x7f273158c000 desc 0x7f273158c008 (index 202124) being reclaimed
[ 1158.478397] sgx: pcmd_page_in_use:49 encl ffff9e76dc550000 addr 0x7f2743cac000 desc 0x7f2743cac008 (index 277676) being reclaimed
[ 1158.478451] sgx: pcmd_page_in_use:49 encl ffff9e76dc550000 addr 0x7f2743cad000 desc 0x7f2743cad008 (index 277677) being reclaimed
[ 1158.478467] sgx: pcmd_page_in_use:49 encl ffff9e76dc550000 addr 0x7f2743cae000 desc 0x7f2743cae008 (index 277678) being reclaimed
[ 1615.425623] sgx: pcmd_page_in_use:49 encl ffff9e774cee0000 addr 0x7efe1692c000 desc 0x7efe1692c008 (index 92460) being reclaimed
[ 1677.797446] sgx: pcmd_page_in_use:49 encl ffff9e77efc30000 addr 0x7feb3637f000 desc 0x7feb3637f008 (index 222079) being reclaimed
[ 1802.242423] sgx: pcmd_page_in_use:49 encl ffff9e76de190000 addr 0x7f5344fbf000 desc 0x7f5344fbf008 (index 282559) being reclaimed
[ 1887.397432] sgx: pcmd_page_in_use:49 encl ffff9e76d7960000 addr 0x7f0870ba8000 desc 0x7f0870ba8008 (index 461736) being reclaimed
[ 1994.519753] sgx: pcmd_page_in_use:49 encl ffff9e76d98d0000 addr 0x7f0531b43000 desc 0x7f0531b43008 (index 203587) being reclaimed
[ 2178.070802] sgx: pcmd_page_in_use:49 encl ffff9e775b400000 addr 0x7f1134149000 desc 0x7f1134149008 (index 213321) being reclaimed
[ 2349.632505] sgx: pcmd_page_in_use:49 encl ffff9e76ed450000 addr 0x7f944b94c000 desc 0x7f944b94c008 (index 309580) being reclaimed
[ 2349.632597] sgx: pcmd_page_in_use:49 encl ffff9e76ed450000 addr 0x7f944b94d000 desc 0x7f944b94d008 (index 309581) being reclaimed
[ 2349.632613] sgx: pcmd_page_in_use:49 encl ffff9e76ed450000 addr 0x7f944b94e000 desc 0x7f944b94e008 (index 309582) being reclaimed

---8<---

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 3051a26179bc..53bcaa153378 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -12,6 +12,76 @@
 #include "encls.h"
 #include "sgx.h"
 
+/**
+ * pcmd_page_in_use() - Query if any enclave page associated with a PCMD
+ *                      page is in process of being reclaimed.
+ * @encl:        Enclave to which PCMD page belongs
+ * @start_addr:  Address of enclave page using first entry within the PCMD page
+ *
+ * When an enclave page is reclaimed some Paging Crypto MetaData (PCMD) is
+ * stored. The PCMD data of a reclaimed enclave page contains enough
+ * information for the processor to verify the page at the time
+ * it is loaded back into the Enclave Page Cache (EPC).
+ *
+ * The backing storage to which enclave pages are reclaimed is laid out as
+ * follows:
+ * All enclave pages:SECS page:PCMD pages
+ *
+ * Each PCMD page contains the PCMD metadata of
+ * PAGE_SIZE/sizeof(struct sgx_pcmd) enclave pages.
+ *
+ * A PCMD page can only be truncated if it is (a) empty, and (b) no enclave
+ * page sharing the PCMD page is in the process of being reclaimed.
+ *
+ * The reclaimer sets the SGX_ENCL_PAGE_BEING_RECLAIMED flag when it
+ * intends to reclaim that enclave page - it means that the PCMD data
+ * associated with that enclave page is about to get some data and thus
+ * even if the PCMD page is empty, it should not be truncated.
+ *
+ * Return: 1 the PCMD page is in use, 0 if is not in use, -EFAULT on error
+ */
+static int pcmd_page_in_use(struct sgx_encl *encl,
+			    unsigned long start_addr)
+{
+	struct sgx_encl_page *entry;
+	unsigned long addr;
+	int reclaimed = 0;
+	int i;
+
+	for (i = 0; i < PAGE_SIZE/sizeof(struct sgx_pcmd); i++) {
+		addr = start_addr + i * PAGE_SIZE;
+
+		/*
+		 * Stop when reaching the SECS page - it does not
+		 * have a page_array entry and its reclaim is
+		 * started and completed with enclave mutex held so
+		 * it does not use the SGX_ENCL_PAGE_BEING_RECLAIMED
+		 * flag.
+		 */
+		if (addr == encl->base + encl->size)
+			break;
+
+		entry = xa_load(&encl->page_array, PFN_DOWN(addr));
+		if (!entry)
+			return -EFAULT;
+
+		/*
+		 * VA page slot ID uses same bit as the flag so it is important
+		 * to ensure that the page is not already in backing store.
+		 */
+		if (entry->epc_page &&
+		    (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)) {
+			pr_debug("%s:%d encl %px addr 0x%lx desc 0x%lx (index %lu) being reclaimed\n",
+				 __func__, __LINE__, encl, addr, entry->desc,
+				 PFN_DOWN(entry->desc - entry->encl->base));
+			reclaimed = 1;
+			break;
+		}
+	}
+
+	return reclaimed;
+}
+
 /*
  * Calculate byte offset of a PCMD struct associated with an enclave page. PCMD's
  * follow right after the EPC data in the backing storage. In addition to the
@@ -47,6 +117,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
 	unsigned long va_offset = encl_page->desc & SGX_ENCL_PAGE_VA_OFFSET_MASK;
 	struct sgx_encl *encl = encl_page->encl;
 	pgoff_t page_index, page_pcmd_off;
+	unsigned long pcmd_first_page;
 	struct sgx_pageinfo pginfo;
 	struct sgx_backing b;
 	bool pcmd_page_empty;
@@ -58,6 +129,12 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
 	else
 		page_index = PFN_DOWN(encl->size);
 
+	/*
+	 * Address of enclave page using the first entry within the
+	 * PCMD page.
+	 */
+	pcmd_first_page = PFN_PHYS(page_index & ~0x1FUL) + encl->base;
+
 	page_pcmd_off = sgx_encl_get_backing_page_pcmd_offset(encl, page_index);
 
 	ret = sgx_encl_lookup_backing(encl, page_index, &b);
@@ -103,7 +180,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
 
 	sgx_encl_truncate_backing_page(encl, page_index);
 
-	if (pcmd_page_empty) {
+	if (pcmd_page_empty && !pcmd_page_in_use(encl, pcmd_first_page)) {
 		ida_free(&encl->pcmd_in_backing, PFN_DOWN(page_pcmd_off));
 		sgx_encl_truncate_backing_page(encl, PFN_DOWN(page_pcmd_off));
 	}
---8<---

What do you think?

Reinette



[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux