[PATCH v2 3/4] x86/sgx: Fine grained SGX MCA behavior for virtualization

Zhiquan Li <zhiquan1.li@xxxxxxxxx> · Thu, 19 May 2022 11:11:51 +0800

When VM guest access a SGX EPC page with memory failure, current
behavior will kill the guest, expected only kill the SGX application
inside it.

To fix it we send SIGBUS with code BUS_MCEERR_AR and some extra
information for hypervisor to inject #MC information to guest, which is
helpful in SGX case.

The rest of things are guest side. Currently the hypervisor like Qemu
already has mature facility to convert HVA to GPA and inject #MC to
the guest OS.

Unlike host enclaves, virtual EPC instance cannot be shared by multiple
VMs.  It is because how enclaves are created is totally up to the guest.
Sharing virtual EPC instance will be very likely to unexpectedly break
enclaves in all VMs.

SGX virtual EPC driver doesn't explicitly prevent virtual EPC instance
being shared by multiple VMs via fork().  However KVM doesn't support
running a VM across multiple mm structures, and the de facto userspace
hypervisor (Qemu) doesn't use fork() to create a new VM, so in practice
this should not happen.

Signed-off-by: Zhiquan Li <zhiquan1.li@xxxxxxxxx>
Acked-by: Kai Huang <kai.huang@xxxxxxxxx>
---
Changes since V1:
- Add Acked-by tag from Kai Huang.
- Add Kai's excellent explanation for one virtual EPC be shared by two
  guests case.
---
 arch/x86/kernel/cpu/sgx/main.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 8e4bc6453d26..81801ab0009e 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -710,6 +710,8 @@ int arch_memory_failure(unsigned long pfn, int flags)
 	struct sgx_epc_page *page = sgx_paddr_to_page(pfn << PAGE_SHIFT);
 	struct sgx_epc_section *section;
 	struct sgx_numa_node *node;
+	struct sgx_vepc_page *owner;
+	int ret = 0;
 
 	/*
 	 * mm/memory-failure.c calls this routine for all errors
@@ -726,8 +728,22 @@ int arch_memory_failure(unsigned long pfn, int flags)
 	 * error. The signal may help the task understand why the
 	 * enclave is broken.
 	 */
-	if (flags & MF_ACTION_REQUIRED)
-		force_sig(SIGBUS);
+	if (flags & MF_ACTION_REQUIRED) {
+		/*
+		 * In case the error memory is accessed by VM guest, provide
+		 * extra info for hypervisor to make further decision but not
+		 * simply kill it.
+		 */
+		if (page->flags & SGX_EPC_PAGE_IS_VEPC) {
+			owner = (struct sgx_vepc_page *)page->owner;
+			ret = force_sig_mceerr(BUS_MCEERR_AR, (void __user *)owner->vaddr,
+					PAGE_SHIFT);
+			if (ret < 0)
+				pr_err("Memory failure: Error sending signal to %s:%d: %d\n",
+					current->comm, current->pid, ret);
+		} else
+			force_sig(SIGBUS);
+	}
 
 	section = &sgx_epc_sections[page->section];
 	node = section->node;
-- 
2.25.1