Hi Andrew, Michal, Thanks for the feedback. The issue is that CPU-less nodes can lead to incorrect NUMA stats. For example, NUMA_HIT may incorrectly increase for CPU-less nodes because the current logic doesn't account for whether a node has CPUs. Key changes: local_stat: CPU-less nodes can't be "local," so allocations are counted as NUMA_OTHER. preferred_zone: If the preferred zone is CPU-less, NUMA_HIT and NUMA_FOREIGN are not updated since no CPU runs there. This ensures more accurate stats, especially for cases like dev_dax and cpuset. Hope that clarifies things. Thanks, Dongjoo On Wed, Oct 23, 2024 at 11:38:40PM +0200, Michal Hocko wrote: > On Wed 23-10-24 13:41:21, Andrew Morton wrote: > > On Wed, 23 Oct 2024 20:03:24 +0200 Michal Hocko <mhocko@xxxxxxxx> wrote: > > > > > On Wed 23-10-24 10:50:37, Dongjoo Seo wrote: > > > > This patch corrects this issue by: > > > > > > What is this issue? Please describe the problem first, > > > > Actually, relocating the author's second-last paragraph to > > top-of-changelog produced a decent result ;) > > > > > ideally describe > > > the NUMA topology, workload and what kind of misaccounting happens > > > (expected values vs. really reported values). > > > > I think the changelog covered this adequately? > > > > So with these changelog alterations I've queued this for 6.12-rcX with > > a cc:stable. As far as I can tell this has been there since 2018. > > > > : In the case of memoryless node, when a process prefers a node with no > > : memory(e.g., because it is running on a CPU local to that node), the > > : kernel treats a nearby node with memory as the preferred node. As a > > : result, such allocations do not increment the numa_foreign counter on the > > : memoryless node, leading to skewed NUMA_HIT, NUMA_MISS, and NUMA_FOREIGN > > : stats for the nearest node. > > I am sorry but I still do not underastand that. Especially when I do > look at the patch which would like to treat cpuless nodes specially. > Let me be more specific. Why ... > > > - if (zone_to_nid(z) != numa_node_id()) > > + if (zone_to_nid(z) != numa_node_id() || z_is_cpuless) > > local_stat = NUMA_OTHER; > > > > - if (zone_to_nid(z) == zone_to_nid(preferred_zone)) > > + if (zone_to_nid(z) == zone_to_nid(preferred_zone) && !z_is_cpuless) > > __count_numa_events(z, NUMA_HIT, nr_account); > > else { > > __count_numa_events(z, NUMA_MISS, nr_account); > > - __count_numa_events(preferred_zone, NUMA_FOREIGN, nr_account); > > + if (!pref_is_cpuless) > > + __count_numa_events(preferred_zone, NUMA_FOREIGN, nr_account); > > ... a (well?) established meaning of local needs to be changed? Why > prefrerred policy should have a different meaning for cpuless policies? > Those are memory specific rather than cpu specific right? > > Quite some quiestions to have it in linux-next IMHO.... > -- > Michal Hocko > SUSE Labs