On Fri, Mar 19, 2021 at 10:01:41PM +1300, Kai Huang wrote: > On Fri, 19 Mar 2021 09:45:23 +0100 Borislav Petkov wrote: > > On Fri, Mar 19, 2021 at 05:06:02PM +1300, Kai Huang wrote: > > > Below kernel bug happened when running simple SGX application when EPC > > > is under pressure. The root cause is with commit 5b8719504e3a > > > ("x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()"), > > > __sgx_alloc_epc_page() returns NULL when there's no free EPC page can be > > > allocated, while old behavior was it always returned ERR_PTR(-ENOMEM) in > > > such case. > > > > > > Fix by directly returning the page if __sgx_alloc_epc_page_from_node() > > > allocates a valid page in fallback to non-local allocation, and always > > > returning ERR_PTR(-ENOMEM) if no EPC page can be allocated. > > > > > > [ 253.474764] BUG: kernel NULL pointer dereference, address: 0000000000000008 > > > [ 253.500101] #PF: supervisor write access in kernel mode > > > [ 253.525462] #PF: error_code(0x0002) - not-present page > > > ... > > > [ 254.102041] Call Trace: > > > [ 254.126699] sgx_ioc_enclave_add_pages+0x241/0x770 > > > [ 254.151305] sgx_ioctl+0x194/0x4b0 > > > [ 254.174976] ? handle_mm_fault+0xd0/0x260 > > > [ 254.198470] ? do_user_addr_fault+0x1ef/0x570 > > > [ 254.221827] __x64_sys_ioctl+0x91/0xc0 > > > [ 254.244546] do_syscall_64+0x38/0x90 > > > [ 254.266728] entry_SYSCALL_64_after_hwframe+0x44/0xae > > > [ 254.289232] RIP: 0033:0x7fdc4cf4031b > > > ... > > > [ 254.711480] CR2: 0000000000000008 > > > [ 254.735494] ---[ end trace 970dce6d4cdf7f64 ]--- > > > [ 254.759915] RIP: 0010:sgx_alloc_epc_page+0x46/0x152 > > > ... > > > > > > Fixes: 5b8719504e3a("x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()") > > > Signed-off-by: Kai Huang <kai.huang@xxxxxxxxx> > > > --- > > > arch/x86/kernel/cpu/sgx/main.c | 4 ++-- > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > I was on the verge whether to merge that into the original patch since > > it is the top patch on the branch or create a new one but opted for > > former because this way it won't break bisection and people won't have > > to pay attention whether there's a fix patch to the NUMA patch too, in > > case they wanna backport and whatnot. > > Sure. > > [...] > > > + > > + /* Fall back to the non-local NUMA nodes: */ > > + while (true) { > > + nid = next_node_in(nid, sgx_numa_mask); > > + if (nid == nid_of_current) > > + break; > > > > - page = __sgx_alloc_epc_page_from_section(section); > > + page = __sgx_alloc_epc_page_from_node(nid); > > if (page) > > return page; > > } > > It seems "return ERR_PTR(-ENOMEM)" is missing at the bottom of this function? I understood Boris' comment that the fixes would not be squashed. /Jarkko