>-----Original Message----- >From: Jarkko Sakkinen <jarkko@xxxxxxxxxx> >Sent: Thursday, May 26, 2022 8:35 AM >To: Li, Zhiquan1 <zhiquan1.li@xxxxxxxxx>; linux-sgx@xxxxxxxxxxxxxxx; Luck, >Tony <tony.luck@xxxxxxxxx> >Cc: dave.hansen@xxxxxxxxxxxxxxx; Christopherson,, Sean ><seanjc@xxxxxxxxxx>; Huang, Kai <kai.huang@xxxxxxxxx>; Du, Fan ><fan.du@xxxxxxxxx> >Subject: Re: [PATCH v3 0/3] x86/sgx: fine grained SGX MCA behavior > >On Wed, 2022-05-25 at 18:06 +0800, Zhiquan Li wrote: >> V2: >https://lore.kernel.org/linux-sgx/694234d7-6a0d-e85f-f2f9-e52b4a61e1ec@int >el.com/T/#t >> >> Changes since V2: >> - Repurpose the owner field as the virtual address of virtual EPC page >> - Remove struct sgx_vepc_page and relevant code. >> - Remove patch 01 as the changes are not necessary in new design. >> - Rework patch 02 suggested by Jarkko. >> - Adapt patch 03 and 04 since struct sgx_vepc_page was discarded. >> - Replace EPC page flag SGX_EPC_PAGE_IS_VEPC with >> SGX_EPC_PAGE_KVM_GUEST as they are duplicated. >> Link: >https://lore.kernel.org/linux-sgx/eb95b32ecf3d44a695610cf7f2816785@intel.c >om/T/#u >> >> V1: >https://lore.kernel.org/linux-sgx/443cb425-009c-2784-56f4-5e707122de76@int >el.com/T/#t >> >> Changes since V1: >> - Updated cover letter and commit messages, added valuable >> information from Jarkko, Tony and Kai's comments. >> - Added documentations for struct struct sgx_vepc and >> struct sgx_vepc_page. >> >> Hi everyone, >> >> This series contains a few patches to fine grained SGX MCA behavior. >> >> When VM guest access a SGX EPC page with memory failure, current >> behavior will kill the guest, expected only kill the SGX application >> inside it. >> >> To fix it we send SIGBUS with code BUS_MCEERR_AR and some extra >> information for hypervisor to inject #MC information to guest, which >> is helpful in SGX virtualization case. >> >> The rest of things are guest side. Currently the hypervisor like >> Qemu already has mature facility to convert HVA to GPA and inject #MC >> to the guest OS. >> >> Then we extend the solution for the normal SGX case, so that the task >> has opportunity to make further decision while EPC page has memory >> failure. >> >> However, when a page triggers a machine check, it only reports the PFN. >> But in order to inject #MC into hypervisor, the virtual address >> is required. Then repurpose the "owner" field as the virtual address of >> the virtual EPC page so that arch_memory_failure() can easily retrieve >> it. >> >> The EPC page flag - SGX_EPC_PAGE_KVM_GUEST to interpret the meaning >> of the field. >> >> Suppose an enclave is shared by multiple processes, when an enclave >> page triggers a machine check, the enclave will be disabled so that >> it couldn't be entered again. Killing other processes with the same >> enclave mapped would perhaps be overkill, but they are going to find >> that the enclave is "dead" next time they try to use it. Thanks for >> Jarkko's head up and Tony's clarification on this point. >> >> Our intension is to provide additional info so that the application has >> more choices. Current behavior looks gently, and we don't want to >> change it. >> >> If you expect the other processes to be informed in such case, then >> you're looking for an MCA "early kill" feature which worth another >> patch set to implement it. >> >> Unlike host enclaves, virtual EPC instance cannot be shared by multiple >> VMs. It is because how enclaves are created is totally up to the guest. >> Sharing virtual EPC instance will be very likely to unexpectedly break >> enclaves in all VMs. >> >> SGX virtual EPC driver doesn't explicitly prevent virtual EPC instance >> being shared by multiple VMs via fork(). However KVM doesn't support >> running a VM across multiple mm structures, and the de facto userspace >> hypervisor (Qemu) doesn't use fork() to create a new VM, so in practice >> this should not happen. >> >> This series is based on tip/x86/sgx with the following additionally >> applied: >> >> "x86/sgx: Keep record for SGX VA and Guest page type" >> >https://lore.kernel.org/linux-sgx/eb95b32ecf3d44a695610cf7f2816785@intel.c >om/T/#mbd2ca61983f1f9514a8baf07fdb17d33495eeada >> >> Tests: >> 1. MCE injection test for SGX in VM. >> As we expected, the application was killed and VM was alive. >> 2. MCE injection test for SGX on host. >> As we expected, the application received SIGBUS with extra info. >> 3. Kernel selftest/sgx: PASS >> 4. Internal SGX stress test: PASS >> 5. kmemleak test: No memory leakage detected. >> >> Much appreciate your feedback. >> >> Best Regards, >> Zhiquan >> >> Zhiquan Li (3): >> x86/sgx: Repurpose the owner field as the virtual address of virtual >> EPC page >> x86/sgx: Fine grained SGX MCA behavior for virtualization >> x86/sgx: Fine grained SGX MCA behavior for normal case >> >> arch/x86/kernel/cpu/sgx/main.c | 27 +++++++++++++++++++++++++-- >> arch/x86/kernel/cpu/sgx/sgx.h | 2 ++ >> arch/x86/kernel/cpu/sgx/virt.c | 4 +++- >> 3 files changed, 30 insertions(+), 3 deletions(-) >> > >This applies on top of Cathy's series, right? Why not send one >series with all 12 patches included? > >It makes reviewing easier, and we are well beyond 5.19 timeline >for these features. Patches from Zhiquan try to improve SGX MCA handling, actually this is a BUG being discussed with customer with SGX deployment already - SGX VM instance got killed in case of SGX application inside VM consumed poison EPC pages. Expected behavior: SGX application get killed only in such scenario. Seamless patchset from Catchy is another standalone feature, the design seems still under discussion. Combining those two distinct purpose-built patchset together looks wired. Zhiquan's patchset introduced the SGX_EPC_PAGE_IS_VEPC [1] macro to mark guest EPC page, luckily Cathy's patchset also has a prior built macro SGX_EPC_PAGE_KVM_GUEST with same semantics as a simple standalone patch[2]. For completeness, how about incorporate Cathy's patch[2](keep original authorship) into Zhiquan up-coming next version(3 patch from Zhiquan, 1 patch from Cathy)? Offline synced with Cathy, she is personally ok. while let's see what others think about on how to prioritize those two things - bugfix and feature enhancement. So , Tony, Dave, any suggestion here on how we could move on? Thanks! [1] https://patchwork.kernel.org/project/intel-sgx/patch/20220519031137.245767-1-zhiquan1.li@xxxxxxxxx/ [2]: https://patchwork.kernel.org/project/intel-sgx/patch/20220520103904.1216-4-cathy.zhang@xxxxxxxxx/ >BR, Jarkko