Re: [RFC PATCH 1/4] x86/sgx: Do not free backing memory on ENCLS[ELDU] failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dave,

On 4/28/2022 3:53 PM, Dave Hansen wrote:
> On 4/28/22 15:20, Reinette Chatre wrote:
>> Hi Dave,
>>
>> On 4/28/2022 2:30 PM, Dave Hansen wrote:
>>> On 4/28/22 13:11, Reinette Chatre wrote:
>>
>>> Are there any transient, recoverable errors that can come back from
>>> ELDU?  If so, this makes a lot of sense.  If not, then it doesn't make a
>>> lot of sense to preserve the swapped-out content because they enclave is
>>> going to die anyway.
>>
>> Good point.
>>
>> Theoretically ELDU could encounter a page fault while accessing the 
>> regions it needs to read from and write to. These faults are passed
>> through and the instruction would return with a #PF that is
>> propagated with the page fault handler returning SIGBUS.
> 
> We don't have to worry about those, though, do we?  We're operating
> entirely on kernel mappings that won't cause #PF.

Indeed, yes, I do not see how an ELDU error or fault is recoverable.

> 
>> Even so, this flow also impacts the SGX2 flows that need to load pages from
>> the backing store. In this case the kernel would pass it as an error
>> (-EFAULT) to the runtime but it would not result in the
>> enclave being killed. If it was a #PF that caused the issue then
>> perhaps theoretically the SGX2 instruction has a chance of succeeding
>> if the runtime attempts it again? 
> 
> How are the SGX2 flows different than what we have now?

SGX2 uses the same flow as the page fault handler to load the
page into the enclave. The only difference is that the SGX2 flow removed the
VMA permission checks. See:
https://lore.kernel.org/lkml/db3a14f2d2df7678dec23375d48c96b603f8cfb5.1649878359.git.reinette.chatre@xxxxxxxxx/

As per the trace printed in the WARN the issue being investigated is
triggered by the ELDU run as part of the page fault handler, not
the SGX2 flows.

> 
> I also looked a little deeper at this transient failure problem.  The
> ELDU documentation also mentions a possible error code of:
> 
> 	SGX_EPC_PAGE_CONFLICT
> 
> It *looks* like there can be conflicts on the SECS page as well as the
> EPC page being explicitly accessed.  Is that a possible problem here?

I went down this path myself. SGX_EPC_PAGE_CONFLICT is an error code
supported by newer ELDUC - the ELDU used in current code would indeed
#GP in this case. The SDM text describing ELDUC as "This leaf function
behaves like ELDU but with improved conflict handling for oversubscription"
really does seem relevant to the test that triggers this issue.

I stopped pursuing this because from what I understand if
SGX_EPC_PAGE_CONFLICT is encountered with commit 08999b2489b4 ("x86/sgx:
Free backing memory after faulting the enclave page") then it should
also be encountered without it. The issue is not present with
08999b2489b4 ("x86/sgx: Free backing memory after faulting the
enclave page") removed. I am thus currently investigating based on
the assumption that the #GP is encountered because of MAC
verification problem. I may be wrong here also and need more information
since the SDM documents two seemingly related errors:
#GP(0) -> If the instruction fails to verify MAC.
SGX_MAC_COMPARE_FAIL -> If the MAC check fails.


Reinette



[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux