On 2022/10/11 22:04, Dave Hansen wrote: > On 9/19/22 23:39, Zhiquan Li wrote: >> Today, if a guest accesses an SGX EPC page with memory failure, >> the kernel behavior will kill the entire guest. This blast >> radius is too large. It would be idea to kill only the SGX > > ideal ^ > >> application inside the guest. >> >> To fix this, send a SIGBUS to host userspace (like QEMU) which can >> follow up by injecting a #MC to the guest. > > This doesn't make any sense to me. It's *ALREADY* sending a SIGBUS. > So, whatever is making this better, it's not "send a SIGBUS" that's > doing it. > > What does this patch actually do to reduce the blast radius? > Thanks for your attention, Dave. This part comes from your comments previously: https://lore.kernel.org/linux-sgx/Yrf27fugD7lkyaek@xxxxxxxxxx/T/#m6d62670eb530fab178eefaaaf31a22c4475e818d The key is the SIGBUS should with code BUS_MCEERR_AR and virtual address of virtual EPC page. Hypervisor can identify it with the specific code and inject #MC to the guest. Can we change the statement like this? To fix this, send a SIGBUS with code BUS_MCEERR_AR and virtual address of virtual EPC page to host userspace (like QEMU) which can follow up by injecting a #MC to the guest. >> SGX virtual EPC driver doesn't explicitly prevent virtual EPC instance >> being shared by multiple VMs via fork(). However KVM doesn't support >> running a VM across multiple mm structures, and the de facto userspace >> hypervisor (Qemu) doesn't use fork() to create a new VM, so in practice >> this should not happen. > > This is out of the blue. Why is this here? > > What happens if a hypervisor *DOES* fork()? What's the fallout? This part originates from below discussion: https://lore.kernel.org/linux-sgx/52dc7f50b68c99cecb9e1c3383d9c6d88734cd67.camel@xxxxxxxxx/#t It intents to answer the question: Do you think the processes sharing the same enclave need to be killed, even they had not touched the EPC page with hardware error? Dave, do you mean it's not appropriate to be put here? Best Regards, Zhiquan