On Thu Sep 5, 2024 at 11:08 AM EEST, Aaron Lu wrote: > When the current node doesn't have an EPC section configured by firmware > and all other EPC sections are used up, CPU can get stuck inside the > while loop that looks for an available EPC page from remote nodes > indefinitely, leading to a soft lockup. Note how nid_of_current will > never be equal to nid in that while loop because nid_of_current is not > set in sgx_numa_mask. > > Also worth mentioning is that it's perfectly fine for the firmware not > to setup an EPC section on a node. While setting up an EPC section on > each node can enhance performance, it is not a requirement for > functionality. > > Rework the loop to start and end on *a* node that has SGX memory. This > avoids the deadlock looking for the current SGX-lacking node to show up > in the loop when it never will. > > Fixes: 901ddbb9ecf5 ("x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()") > Reported-by: "Molina Sabido, Gerardo" <gerardo.molina.sabido@xxxxxxxxx> > Tested-by: Zhimin Luo <zhimin.luo@xxxxxxxxx> > Reviewed-by: Kai Huang <kai.huang@xxxxxxxxx> > Acked-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> > Signed-off-by: Aaron Lu <aaron.lu@xxxxxxxxx> > --- > arch/x86/kernel/cpu/sgx/main.c | 27 ++++++++++++++------------- > 1 file changed, 14 insertions(+), 13 deletions(-) > > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c > index 1a000acd933a..694fcf7a5e3a 100644 > --- a/arch/x86/kernel/cpu/sgx/main.c > +++ b/arch/x86/kernel/cpu/sgx/main.c > @@ -475,24 +475,25 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void) > { > struct sgx_epc_page *page; > int nid_of_current = numa_node_id(); > - int nid = nid_of_current; > + int nid_start, nid; > > - if (node_isset(nid_of_current, sgx_numa_mask)) { > - page = __sgx_alloc_epc_page_from_node(nid_of_current); > - if (page) > - return page; > - } > - > - /* Fall back to the non-local NUMA nodes: */ > - while (true) { > - nid = next_node_in(nid, sgx_numa_mask); > - if (nid == nid_of_current) > - break; > + /* > + * Try local node first. If it doesn't have an EPC section, > + * fall back to the non-local NUMA nodes. > + */ > + if (node_isset(nid_of_current, sgx_numa_mask)) > + nid_start = nid_of_current; > + else > + nid_start = next_node_in(nid_of_current, sgx_numa_mask); > > + nid = nid_start; > + do { > page = __sgx_alloc_epc_page_from_node(nid); > if (page) > return page; > - } > + > + nid = next_node_in(nid, sgx_numa_mask); > + } while (nid != nid_start); > > return ERR_PTR(-ENOMEM); > } Looks good to me: Reviewed-by: Jarkko Sakkinen <jarkko@xxxxxxxxxx> BR, Jarkko