Re: [PATCH 0/4] x86/sgx: fine grained SGX MCA behavior

Zhiquan Li <zhiquan1.li@xxxxxxxxx> · Sat, 14 May 2022 13:39:28 +0800

On 2022/5/14 00:35, Luck, Tony wrote:
>>> Do you think the processes sharing the same enclave need to be killed,
>>> even they had not touched the EPC page with hardware error?
>>> Any ideas are welcome.
>>
>> I do not think the patch set is going to wrong direction. This discussion
>> was just missing from the cover letter.

OK, I will add this point into v2 of cover letter and patch 03.

> 
> I was under the impression that when an enclave page triggers a machine check
> the whole enclave is (somehow) marked bad, so that it couldn't be entered again.
> 
> Killing other processes with the same enclave mapped would perhaps be overkill,
> but they are going to find that the enclave is "dead" next time they try to use it.

Thanks for your clarification, Tony.
You reminded me to check Intel SDM vol.3, 38.15.1 Interactions with MCA Events:

"All architecturally visible machine check events (#MC and CMCI) that are detected
while inside an enclave cause an asynchronous enclave exit.
Any machine check exception (#MC) that occurs after Intel SGX is first enables
causes Intel SGX to be disabled, (CPUID.SGX_Leaf.0:EAX[SGX1] == 0). It cannot be
enabled until after the next reset. "

So, I suppose current behavior would be gently enough, other processes with the
same enclave mapped should get rid of it if they really need to use the enclave
again. If we expect those processes to be early killed, it worth another patch set
to archive it.

Best Regards,
Zhiquan

> 
> -Tony