On 10/25/2023 4:58 PM, Huang, Kai wrote: > On Wed, 2023-10-25 at 07:31 -0700, Hansen, Dave wrote: >> On 10/19/23 19:53, Haitao Huang wrote: >>> In the EAUG on page fault path, VM_FAULT_OOM is returned when the >>> Enclave Page Cache (EPC) runs out. This may trigger unneeded OOM kill >>> that will not free any EPCs. Return VM_FAULT_SIGBUS instead. This commit message does not seem accurate to me. From what I can tell VM_FAULT_SIGBUS is indeed returned when EPC runs out. What is addressed with this patch is the error returned when kernel (not EPC) memory runs out. >> So, when picking an error code and we look the documentation for the >> bits, we see: >> >>> * @VM_FAULT_OOM: Out Of Memory >>> * @VM_FAULT_SIGBUS: Bad access >> >> So if anything we'll need a bit more changelog where you explain how >> running out of enclave memory is more "Bad access" than "Out Of Memory". >> Because on the surface this patch looks wrong. >> >> But that's just a naming thing. What *behavior* is bad here? With the >> old code, what happens? With the new code, what happens? Why is the >> old better than the new? > > I think Haitao meant if we return OOM, the core-MM fault handler will believe > the fault couldn't be handled because of running out of memory, and then it > could invoke the OOM killer which might select an unrelated victim who might > have no EPC at all. Since the issue is that system is out of kernel memory the resolution may need to look further than owners with EPC memory. ... > > (Also, currently the non-EAUG code path (ELDU) in sgx_vma_fault() also returns > SIGBUS if it fails to allocate EPC, so making EAUG code path return SIGBUS also > matches the ELDU path.) > These errors all seem related to EPC memory to me, not kernel memory. Reinette