On Mon, 22 Mar 2021 23:37:26 +0100 Borislav Petkov wrote: > On Tue, Mar 23, 2021 at 11:06:43AM +1300, Kai Huang wrote: > > This path is called by host SGX driver only, so yes this leaking is done by > > host enclaves only. > > Yes, so I was told. > > > This patch is purpose is to break EREMOVE out of sgx_free_epc_page() so virtual > > EPC code can use sgx_free_epc_page(), and handle EREMOVE logic differently. > > This patch doesn't have intention to bring functional change. I changed the > > error msg because Dave said it worth to give a message saying EPC page is > > leaked, and I thought this minor change won't break anything. > > I read that already - you don't have to repeat it. > > > btw, currently virtual EPC code (patch 5) handles in similar way: There's one > > EREMOVE error is expected and virtual EPC code can handle, but for other > > errors, it means kernel bug, and virtual EPC code gives a WARN(), and that EPC > > page is leaked too: > > > > + WARN_ONCE(ret != SGX_CHILD_PRESENT, > > + "EREMOVE (EPC page 0x%lx): unexpected error: %d\n", > > + sgx_get_epc_phys_addr(epc_page), ret); > > + return ret; > > > > So to me they are just WARN() to catch kernel bug. > > You don't care about users, do you? Because when that error happens, > they won't come crying to you to fix it. > > Lemme save you some trouble: we don't take half-baked code into the > kernel until stuff has been discussed and analyzed properly. So instead > of trying to downplay this, try answering my questions. > > Here's another one: when does EREMOVE fail? > > /me goes and looks it up > > "The instruction fails if the operand is not properly aligned or does > not refer to an EPC page or the page is in use by another thread, or > other threads are running in the enclave to which the page belongs. In > addition the instruction fails if the operand refers to an SECS with > associations." > > And I guess those conditions will become more in the future. > > Now, let's play. I'm the cloud admin and you're cloud OS customer > support. I say: > > "I got this scary error message while running enclaves on my server > > "EREMOVE returned ... . EPC page leaked. Reboot required to retrieve leaked pages." > > but I cannot reboot that machine because there are guests running on it > and I'm getting paid for those guests and I might get sued if I do?" > > Your turn, go wild. I suppose admin can migrate those VMs, and then engineers can analyse the root cause of such failure, and then fix it.