Xie XiuQi <xiexiuqi@xxxxxxxxxx> writes: > Hi Lorenzo, Punit, > > > On 2018/6/20 0:32, Lorenzo Pieralisi wrote: >> On Tue, Jun 19, 2018 at 04:35:40PM +0100, Punit Agrawal wrote: >>> Michal Hocko <mhocko@xxxxxxxxxx> writes: >>> >>>> On Tue 19-06-18 15:54:26, Punit Agrawal wrote: >>>> [...] >>>>> In terms of $SUBJECT, I wonder if it's worth taking the original patch >>>>> as a temporary fix (it'll also be easier to backport) while we work on >>>>> fixing these other issues and enabling memoryless nodes. >>>> >>>> Well, x86 already does that but copying this antipatern is not really >>>> nice. So it is good as a quick fix but it would be definitely much >>>> better to have a robust fix. Who knows how many other places might hit >>>> this. You certainly do not want to add a hack like this all over... >>> >>> Completely agree! I was only suggesting it as a temporary measure, >>> especially as it looked like a proper fix might be invasive. >>> >>> Another fix might be to change the node specific allocation to node >>> agnostic allocations. It isn't clear why the allocation is being >>> requested from a specific node. I think Lorenzo suggested this in one of >>> the threads. >> >> I think that code was just copypasted but it is better to fix the >> underlying issue. >> >>> I've started putting together a set fixing the issues identified in this >>> thread. It should give a better idea on the best course of action. >> >> On ACPI ARM64, this diff should do if I read the code correctly, it >> should be (famous last words) just a matter of mapping PXMs to nodes for >> every SRAT GICC entry, feel free to pick it up if it works. >> >> Yes, we can take the original patch just because it is safer for an -rc >> cycle even though if the patch below would do delaying the fix for a >> couple of -rc (to get it tested across ACPI ARM64 NUMA platforms) is >> not a disaster. > > I tested this patch on my arm board, it works. I am assuming you tried the patch without enabling support for memory-less nodes. The patch de-couples the onlining of numa nodes (as parsed from SRAT) from NR_CPUS restriction. When it comes to building zonelists, the node referenced by the PCI controller also has zonelists initialised. So it looks like a fallback node is setup even if we don't have memory-less nodes enabled. I need to stare some more at the code to see why we need memory-less nodes at all then ...