Re: [PATCH v9 3/3] x86/sgx: Fine grained SGX MCA behavior for virtualization

Dave Hansen <dave.hansen@xxxxxxxxx> · Wed, 12 Oct 2022 07:36:03 -0700

On 10/11/22 22:09, Zhiquan Li wrote:
>>> application inside the guest.
>>>
>>> To fix this, send a SIGBUS to host userspace (like QEMU) which can
>>> follow up by injecting a #MC to the guest.
>>
>> This doesn't make any sense to me.  It's *ALREADY* sending a SIGBUS.
>> So, whatever is making this better, it's not "send a SIGBUS" that's
>> doing it.
>>
>> What does this patch actually do to reduce the blast radius?
> 
> Thanks for your attention, Dave.
> 
> This part comes from your comments previously:
> 
> https://lore.kernel.org/linux-sgx/Yrf27fugD7lkyaek@xxxxxxxxxx/T/#m6d62670eb530fab178eefaaaf31a22c4475e818d
> 
> The key is the SIGBUS should with code BUS_MCEERR_AR and virtual address
> of virtual EPC page. Hypervisor can identify it with the specific code
> and inject #MC to the guest.
> 
> Can we change the statement like this?
> 
>     To fix this, send a SIGBUS with code BUS_MCEERR_AR and virtual
>     address of virtual EPC page to host userspace (like QEMU) which can
>     follow up by injecting a #MC to the guest.

This is really just mechanically restating what the patch does.  It
doesn't help me understand how it achieves the goal.  I guess I'll just
go and write the changelog for you.  Here's what I was missing:

	There is already a signal-based ABI to tell userspace about
	machine checks.  But, SGX does not use that ABI.  Today, the
	kernel delivers a generic SIGBUS if a machine check occurs when
	accessing SGX memory.  Userspace can not differentiate that
	SIGBUS from <add example here>, so it is very unlikely to be
	able to recover from the signal and the app will die.

	To fix this, have the SGX machine check code generate a SIGBUS
	which leverages the existing BUS_MCEERR_AR ABI.  This enlightens
	userspace about why the SIGBUS was generated and gives it a
	chance of being able to handle the signal.

	QEMU, for instance, has code to handle these BUS_MCEERR_AR
	signals today.  Without this patch, QEMU will just die in the
	face of a generic SIGBUS, and take the whole VM with it.  With
	this patch <explain what QEMU actually does here>.

	In short, BUS_MCEERR_AR enables QEMU to reduce the blast radius
	down from the whole QEMU process to a single page.

This patch doesn't *actually* reduce the blast radius.  It enables QEMU
to do that.

>>> SGX virtual EPC driver doesn't explicitly prevent virtual EPC instance
>>> being shared by multiple VMs via fork().  However KVM doesn't support
>>> running a VM across multiple mm structures, and the de facto userspace
>>> hypervisor (Qemu) doesn't use fork() to create a new VM, so in practice
>>> this should not happen.
>>
>> This is out of the blue.  Why is this here?
>>
>> What happens if a hypervisor *DOES* fork()?  What's the fallout?
> 
> This part originates from below discussion:
> 
> https://lore.kernel.org/linux-sgx/52dc7f50b68c99cecb9e1c3383d9c6d88734cd67.camel@xxxxxxxxx/#t
> 
> It intents to answer the question:
> 
>     Do you think the processes sharing the same enclave need to be
>     killed, even they had not touched the EPC page with hardware error?
> 
> Dave, do you mean it's not appropriate to be put here?

It's actually a pretty important point, but it's still out of the blue.

You also didn't answer my question.