On Wed, May 11, 2022 at 01:29:59PM -0500, Haitao Huang wrote: > Hi Jarkko > > On Wed, 11 May 2022 05:26:15 -0500, Jarkko Sakkinen <jarkko@xxxxxxxxxx> > wrote: > > > On Tue, May 10, 2022 at 12:36:19PM +1200, Kai Huang wrote: > > > On Mon, 2022-05-09 at 10:17 -0700, Reinette Chatre wrote: > > > > Hi Jarkko, > > > > > > > > On 5/7/2022 10:25 AM, Jarkko Sakkinen wrote: > > > > > On Thu, Apr 28, 2022 at 04:49:00PM -0700, Reinette Chatre wrote: > > > > > > > I also looked a little deeper at this transient failure > > > problem. The > > > > > > > ELDU documentation also mentions a possible error code of: > > > > > > > > > > > > > > SGX_EPC_PAGE_CONFLICT > > > > > > > > > > > > > > It *looks* like there can be conflicts on the SECS page as > > > well as the > > > > > > > EPC page being explicitly accessed. Is that a possible > > > problem here? > > > > > > > > > > > > I went down this path myself. SGX_EPC_PAGE_CONFLICT is an > > > error code > > > > > > supported by newer ELDUC - the ELDU used in current code would > > > indeed > > > > > > #GP in this case. The SDM text describing ELDUC as "This leaf > > > function > > > > > > behaves like ELDU but with improved conflict handling for > > > oversubscription" > > > > > > really does seem relevant to the test that triggers this issue. > > > > > > > > > > > > I stopped pursuing this because from what I understand if > > > > > > SGX_EPC_PAGE_CONFLICT is encountered with commit 08999b2489b4 > > > ("x86/sgx: > > > > > > Free backing memory after faulting the enclave page") then it > > > should > > > > > > also be encountered without it. The issue is not present with > > > > > > 08999b2489b4 ("x86/sgx: Free backing memory after faulting the > > > > > > enclave page") removed. I am thus currently investigating based on > > > > > > the assumption that the #GP is encountered because of MAC > > > > > > verification problem. I may be wrong here also and need more > > > information > > > > > > since the SDM documents two seemingly related errors: > > > > > > #GP(0) -> If the instruction fails to verify MAC. > > > > > > SGX_MAC_COMPARE_FAIL -> If the MAC check fails. > > > > > > > > > > This part puzzles me in the pseudo-code. > > > > > > > > > > The version is read first: > > > > > > > > > > TMP_VER := DS:RDX[63:0]; > > > > > > > > > > Then there's MAC calculation, comparison, and finally this check: > > > > > > > > > > (* Check version before committing *) > > > > > IF (DS:RDX ≠ 0) > > > > > THEN #GP(0); > > > > > ELSE > > > > > DS:RDX := TMP_VER; > > > > > FI; > > > > > > > > > > For me it is a mystery what does zero the slot and in what condition > > > > > it would be non-zero. Perhaps the #GP refers anyway to this check? > > > > > > > We discussed this internally, and agree this part of pseudo code needs be > corrected/clarified. > > Here is what we think was going on when ELDU invoked with PCMD of all zeros: > ELDU would check if the PCMD.SECINFO.FLAGS.PT is 0 which indicates that the > page being loaded is a PT_SECS, and the PAGEINFO.SECS is not zero, then the > instruction will #GP(0). Thus, ELDU is behaving correctly – it is an > omission in the SDM pseudocode. > > The version checking code above also need be clarified because the VA slot > would be cleared at this point and TMP_VER should be zero. > > We will follow up with the architect once he is back from sabbatical and > make needed adjustment for SDM. OK, great to hear, thank you! BR, Jarkko