On Tue, May 10, 2022 at 12:36:19PM +1200, Kai Huang wrote: > On Mon, 2022-05-09 at 10:17 -0700, Reinette Chatre wrote: > > Hi Jarkko, > > > > On 5/7/2022 10:25 AM, Jarkko Sakkinen wrote: > > > On Thu, Apr 28, 2022 at 04:49:00PM -0700, Reinette Chatre wrote: > > > > > I also looked a little deeper at this transient failure problem. The > > > > > ELDU documentation also mentions a possible error code of: > > > > > > > > > > SGX_EPC_PAGE_CONFLICT > > > > > > > > > > It *looks* like there can be conflicts on the SECS page as well as the > > > > > EPC page being explicitly accessed. Is that a possible problem here? > > > > > > > > I went down this path myself. SGX_EPC_PAGE_CONFLICT is an error code > > > > supported by newer ELDUC - the ELDU used in current code would indeed > > > > #GP in this case. The SDM text describing ELDUC as "This leaf function > > > > behaves like ELDU but with improved conflict handling for oversubscription" > > > > really does seem relevant to the test that triggers this issue. > > > > > > > > I stopped pursuing this because from what I understand if > > > > SGX_EPC_PAGE_CONFLICT is encountered with commit 08999b2489b4 ("x86/sgx: > > > > Free backing memory after faulting the enclave page") then it should > > > > also be encountered without it. The issue is not present with > > > > 08999b2489b4 ("x86/sgx: Free backing memory after faulting the > > > > enclave page") removed. I am thus currently investigating based on > > > > the assumption that the #GP is encountered because of MAC > > > > verification problem. I may be wrong here also and need more information > > > > since the SDM documents two seemingly related errors: > > > > #GP(0) -> If the instruction fails to verify MAC. > > > > SGX_MAC_COMPARE_FAIL -> If the MAC check fails. > > > > > > This part puzzles me in the pseudo-code. > > > > > > The version is read first: > > > > > > TMP_VER := DS:RDX[63:0]; > > > > > > Then there's MAC calculation, comparison, and finally this check: > > > > > > (* Check version before committing *) > > > IF (DS:RDX ≠ 0) > > > THEN #GP(0); > > > ELSE > > > DS:RDX := TMP_VER; > > > FI; > > > > > > For me it is a mystery what does zero the slot and in what condition > > > it would be non-zero. Perhaps the #GP refers anyway to this check? > > > > RDX contains the VA slot information and that appears to be correct > > in these scenarios. The issue is the PCMD data pointed to by the > > PAGEINFO.PCMD (link to PAGEINFO found in RBX) is all zeroes. > > > > There are two scenarios under which the PCMD data could be zeroes. They > > are documented in: > > https://lore.kernel.org/linux-sgx/8157fa40-8d02-8819-e1b6-fd2d8863fb56@xxxxxxxxx/ > > https://lore.kernel.org/linux-sgx/da387afc-e666-45d0-1e99-066d8c4aab03@xxxxxxxxx/ > > > > I understand that context may be lost by pointing you to various emails > > in this thread - I'll wrap up all learnings when I submit the new version > > of this series today. > > > > Hi Reinette, > > Regardless the root cause of this problem, I agree with Jarkko above pseudo-code > in the spec is quite confusing. I can try to get it clarified from Intel > internally if you want. It is :-) Yeah, it would be great if it could be made a bit more punctual! BR, Jarkko