On 8/30/2021 5:46 PM, Bharata B Rao wrote: > From: Krupa Ramakrishnan <krupa.ramakrishnan@xxxxxxx> > > In build_zonelists(), when the fallback list is built for the nodes, > the node load gets reinitialized during each iteration. This results > in nodes with same distances occupying the same slot in different > node fallback lists rather than appearing in the intended round- > robin manner. This results in one node getting picked for allocation > more compared to other nodes with the same distance. > > As an example, consider a 4 node system with the following distance > matrix. > > Node 0 1 2 3 > ---------------- > 0 10 12 32 32 > 1 12 10 32 32 > 2 32 32 10 12 > 3 32 32 12 10 > > For this case, the node fallback list gets built like this: > > Node Fallback list > --------------------- > 0 0 1 2 3 > 1 1 0 3 2 > 2 2 3 0 1 > 3 3 2 0 1 <-- Unexpected fallback order FWIW, for a dual-socket 8 node system with the following distance matrix, node 0 1 2 3 4 5 6 7 0: 10 12 12 12 32 32 32 32 1: 12 10 12 12 32 32 32 32 2: 12 12 10 12 32 32 32 32 3: 12 12 12 10 32 32 32 32 4: 32 32 32 32 10 12 12 12 5: 32 32 32 32 12 10 12 12 6: 32 32 32 32 12 12 10 12 7: 32 32 32 32 12 12 12 10 the fallback list looks like this: Before ======= Fallback order for Node 0: 0 1 2 3 4 5 6 7 Fallback order for Node 1: 1 2 3 0 5 6 7 4 Fallback order for Node 2: 2 3 0 1 6 7 4 5 Fallback order for Node 3: 3 0 1 2 7 4 5 6 Fallback order for Node 4: 4 5 6 7 0 1 2 3 Fallback order for Node 5: 5 6 7 4 0 1 2 3 Fallback order for Node 6: 6 7 4 5 0 1 2 3 Fallback order for Node 7: 7 4 5 6 0 1 2 3 After the fix ============== Fallback order for Node 0: 0 1 2 3 4 5 6 7 Fallback order for Node 1: 1 2 3 0 5 6 7 4 Fallback order for Node 2: 2 3 0 1 6 7 4 5 Fallback order for Node 3: 3 0 1 2 7 4 5 6 Fallback order for Node 4: 4 5 6 7 0 1 2 3 Fallback order for Node 5: 5 6 7 4 1 2 3 0 Fallback order for Node 6: 6 7 4 5 2 3 0 1 Fallback order for Node 7: 7 4 5 6 3 0 1 2 So the problem becomes more pronounced for bigger NUMA systems. Regards, Bharata.