On Fri Aug 30, 2024 at 9:14 AM EEST, Aaron Lu wrote: > On Thu, Aug 29, 2024 at 07:44:13PM +0300, Jarkko Sakkinen wrote: > > On Thu Aug 29, 2024 at 5:38 AM EEST, Aaron Lu wrote: > > > When current node doesn't have a EPC section configured by firmware and > > > all other EPC sections memory are used up, CPU can stuck inside the > > > while loop in __sgx_alloc_epc_page() forever and soft lockup will happen. > > > Note how nid_of_current will never equal to nid in that while loop because > > ~~~~ > > > > Oh *that* while loop ;-) Please be more specific. > > What about: > Note how nid_of_current will never be equal to nid in the while loop that > searches an available EPC page from remote nodes because nid_of_current is > not set in sgx_numa_mask. That would work I think! > > > > nid_of_current is not set in sgx_numa_mask. > > > > > > Also worth mentioning is that it's perfectly fine for firmware to not > > > seup an EPC section on a node. Setting an EPC section on each node can > > > be good for performance but that's not a requirement functionality wise. > > > > This lacks any description of what is done to __sgx_alloc_epc_page(). > > Will add what Dave suggested on how the problem is fixed to the changelog. Great. I think the code change is correct reflecting these additions. I'll look the next version as a whole but with high probability I can ack that as long as the commit message has these updates. > > > > > > > Fixes: 901ddbb9ecf5 ("x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()") > > > Reported-by: Zhimin Luo <zhimin.luo@xxxxxxxxx> > > > Tested-by: Zhimin Luo <zhimin.luo@xxxxxxxxx> > > > Signed-off-by: Aaron Lu <aaron.lu@xxxxxxxxx> > > Thanks, > Aaron BR, Jarkko