On Thu, Aug 29, 2024 at 08:17:53AM -0700, Dave Hansen wrote: > Generally, I think it's a bad idea to refer to function names in > subjects. This, for instance would be much more informative: > > x86/sgx: Fix deadlock in SGX NUMA node search Indeed, will use this as subject, thanks. > On 8/28/24 19:38, Aaron Lu wrote: > > When current node doesn't have a EPC section configured by firmware and > > all other EPC sections memory are used up, CPU can stuck inside the > > while loop in __sgx_alloc_epc_page() forever and soft lockup will happen. > > Note how nid_of_current will never equal to nid in that while loop because > > nid_of_current is not set in sgx_numa_mask. > > > > Also worth mentioning is that it's perfectly fine for firmware to not > > seup an EPC section on a node. Setting an EPC section on each node can > > be good for performance but that's not a requirement functionality wise. > > The changelog is a little rough, but I think Kai gave some good > suggestions. The other thing you can do is dump the text in chatgpt (or > whatever) and have it fix your grammar. It actually does a pretty > decent job. Thanks for the suggestion. > > Also, you didn't say _how_ you fixed this. That needs to be in here. > Something along the lines of: > > Rework the loop to start and end on *a* node that has SGX > memory. This avoids the deadlock looking for the current SGX- > lacking node to show up in the loop when it never will. Will add this to the changelog, thanks for the write-up. > > The code looks fine, so feel free to add: > > Acked-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> Thanks. > > Also, I do think we should probably add some kind of sanity warning to > the SGX code in another patch. If a node on an SGX system has CPUs and > memory, it's very likely it will also have some EPC. It can be > something soft like a pr_info(), but I think it would be nice to have. I think there are systems with valid reason to not setup an EPC section per node, e.g. a 8 sockets system with SNC=2, there would be a total of 16 nodes and it's not possible to have one EPC section per node because the upper limit of EPC sections is 8. I'm not sure a warning is appropriate here, what do you think?