On 2022/7/22 00:54, Dave Hansen wrote: > On 6/22/22 02:37, Zhiquan Li wrote: >> When VM guest access a SGX EPC page with memory failure, current >> behavior will kill the guest, expected only kill the SGX application >> inside it. > Can we please clean this up? This is generally readable, but _hard_ to > read. Perhaps: > > Today, if a guest accesses an SGX EPC page with memory failure, > the kernel will behavior will kill the entire guest. This blast > radius is too large. It would be idea to kill only the SGX > application inside the guest. > >> To fix it we send SIGBUS with code BUS_MCEERR_AR and some extra > ^ No "we's". > >> information for hypervisor to inject #MC information to guest, which is >> helpful in SGX case. > To fix this, send a SIGBUS to host userspace (like QEMU) which can > follow up by injecting a #MC to the guest. > >> The rest of things are guest side. Currently the hypervisor like Qemu >> already has mature facility to convert HVA to GPA and inject #MC to >> the guest OS. >> >> Unlike host enclaves, virtual EPC instance cannot be shared by multiple >> VMs. It is because how enclaves are created is totally up to the guest. >> Sharing virtual EPC instance will be very likely to unexpectedly break >> enclaves in all VMs. > I'm not sure why this is here or why it is important to this patch. > >> SGX virtual EPC driver doesn't explicitly prevent virtual EPC instance >> being shared by multiple VMs via fork(). However KVM doesn't support >> running a VM across multiple mm structures, and the de facto userspace >> hypervisor (Qemu) doesn't use fork() to create a new VM, so in practice >> this should not happen. > >> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c >> index ab4ec54bbdd9..4507c2302348 100644 >> --- a/arch/x86/kernel/cpu/sgx/main.c >> +++ b/arch/x86/kernel/cpu/sgx/main.c >> @@ -715,6 +715,8 @@ int arch_memory_failure(unsigned long pfn, int flags) >> struct sgx_epc_page *page = sgx_paddr_to_page(pfn << PAGE_SHIFT); >> struct sgx_epc_section *section; >> struct sgx_numa_node *node; >> + unsigned long vaddr; >> + int ret; >> >> /* >> * mm/memory-failure.c calls this routine for all errors >> @@ -731,8 +733,26 @@ int arch_memory_failure(unsigned long pfn, int flags) >> * error. The signal may help the task understand why the >> * enclave is broken. >> */ >> - if (flags & MF_ACTION_REQUIRED) >> - force_sig(SIGBUS); >> + if (flags & MF_ACTION_REQUIRED) { >> + /* >> + * Provide extra info to the task so that it can make further >> + * decision but not simply kill it. This is quite useful for >> + * virtualization case. >> + */ >> + if (page->flags & SGX_EPC_PAGE_KVM_GUEST) { >> + /* >> + * The "owner" field is repurposed as the virtual address >> + * of virtual EPC page. >> + */ >> + vaddr = (unsigned long)page->owner & PAGE_MASK; > I really don't like repurposing page->owner like this. It requires > casting on *both* sides of a type that we have full control over. > > struct sgx_epc_page { > unsigned int section; > u16 flags; > u16 poison; > union { > struct sgx_encl_page *encl_owner; > // Use when SGX_EPC_PAGE_KVM_GUEST > // set in ->flags: > void __user *vepc_vaddr; > }; > struct list_head list; > }; > > There is zero reason to play casting games instead of doing that ^ > Many thanks for your review, Dave. I will send V6 patch set as per your suggestion. Best Regards, Zhiquan