Re: [PATCH v3 2/8] mm/mprotect: Remove NUMA_HUGE_PTE_UPDATES

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 15.07.24 21:21, Peter Xu wrote:
In 2013, commit 72403b4a0fbd ("mm: numa: return the number of base pages
altered by protection changes") introduced "numa_huge_pte_updates" vmstat
entry, trying to capture how many huge ptes (in reality, PMD thps at that
time) are marked by NUMA balancing.

This patch proposes to remove it for some reasons.

Firstly, the name is misleading. We can have more than one way to have a
"huge pte" at least nowadays, and that's also the major goal of this patch,
where it paves way for PUD handling in change protection code paths.

PUDs are coming not only for dax (which has already came and yet broken..),
but also for pfnmaps and hugetlb pages.  The name will simply stop making
sense when PUD will start to be involved in mprotect() world.

It'll also make it not reasonable either if we boost the counter for both
pmd/puds.  In short, current accounting won't be right when PUD comes, so
the scheme was only suitable at that point in time where PUD wasn't even
possible.

Secondly, the accounting was simply not right from the start as long as it
was also affected by other call sites besides NUMA.  mprotect() is one,
while userfaultfd-wp also leverages change protection path to modify
pgtables.  If it wants to do right it needs to check the caller but it
never did; at least mprotect() should be there even in 2013.

It gives me the impression that nobody is seriously using this field, and
it's also impossible to be serious.

It's weird and the implementation is ugly. The intention really was to only consider MM_CP_PROT_NUMA, but that apparently is not the case.

hugetlb/mprotect/... should have never been accounted.

[...]

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 73d791d1caad..53656227f70d 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1313,7 +1313,6 @@ const char * const vmstat_text[] = {
#ifdef CONFIG_NUMA_BALANCING
  	"numa_pte_updates",
-	"numa_huge_pte_updates",
  	"numa_hint_faults",
  	"numa_hint_faults_local",
  	"numa_pages_migrated",

It's a user-visible update. I assume most tools should be prepared for this stat missing (just like handling !CONFIG_NUMA_BALANCING).

Apparently it's documented [1][2] for some distros:

"The amount of transparent huge pages that were marked for NUMA hinting faults. In combination with numa_pte_updates the total address space that was marked can be calculated."

And now I realize that change_prot_numa() would account these PMD updates as well in numa_pte_updates and I am confused about the SUSE documentation: "In combination with numa_pte_updates" doesn't really apply, right?

At this point I don't know what's right or wrong.

If we'd want to fix it instead, the right thing to do would be doing the accounting only with MM_CP_PROT_NUMA. But then, numa_pte_updates is also wrongly updated I believe :(


[1] https://documentation.suse.com/de-de/sles/12-SP5/html/SLES-all/cha-tuning-numactl.html [2] https://support.oracle.com/knowledge/Oracle%20Linux%20and%20Virtualization/2749259_1.html

--
Cheers,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux