On Thu, 2023-10-26 at 09:01 -0700, Chatre, Reinette wrote: > > On 10/25/2023 4:58 PM, Huang, Kai wrote: > > On Wed, 2023-10-25 at 07:31 -0700, Hansen, Dave wrote: > > > On 10/19/23 19:53, Haitao Huang wrote: > > > > In the EAUG on page fault path, VM_FAULT_OOM is returned when the > > > > Enclave Page Cache (EPC) runs out. This may trigger unneeded OOM kill > > > > that will not free any EPCs. Return VM_FAULT_SIGBUS instead. > > This commit message does not seem accurate to me. From what I can tell > VM_FAULT_SIGBUS is indeed returned when EPC runs out. What is addressed > with this patch is the error returned when kernel (not EPC) memory runs > out. > > > > So, when picking an error code and we look the documentation for the > > > bits, we see: > > > > > > > * @VM_FAULT_OOM: Out Of Memory > > > > * @VM_FAULT_SIGBUS: Bad access > > > > > > So if anything we'll need a bit more changelog where you explain how > > > running out of enclave memory is more "Bad access" than "Out Of Memory". > > > Because on the surface this patch looks wrong. > > > > > > But that's just a naming thing. What *behavior* is bad here? With the > > > old code, what happens? With the new code, what happens? Why is the > > > old better than the new? > > > > I think Haitao meant if we return OOM, the core-MM fault handler will believe > > the fault couldn't be handled because of running out of memory, and then it > > could invoke the OOM killer which might select an unrelated victim who might > > have no EPC at all. > > Since the issue is that system is out of kernel memory the resolution may need to > look further than owners with EPC memory. Oh right, I didn't look into the sgx_encl_page_alloc(): encl_page = kzalloc(sizeof(*encl_page), GFP_KERNEL); if (!encl_page) return ERR_PTR(-ENOMEM); > > ... > > > > > (Also, currently the non-EAUG code path (ELDU) in sgx_vma_fault() also returns > > SIGBUS if it fails to allocate EPC, so making EAUG code path return SIGBUS also > > matches the ELDU path.) > > > > These errors all seem related to EPC memory to me, not kernel memory. Right.