On 08.08.23 04:28, Xueshi Hu wrote:
On 8/7/23 23:15, David Hildenbrand wrote:
On 06.08.23 09:48, Xueshi Hu wrote:
There are currently three 'nr_hugepages' used to export the number of
huge
pages:
1. /proc/sys/vm/nr_hugepages
2. /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
3. /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
For consistency, all three 'nr_hugepages' should return the total number
of huge pages. When written, the number of persistent huge pages will be
adjusted to the specified value.
But, /proc/sys/vm/nr_hugepages returns the number of persistent huge
pages.
But that's documented behavior, no?
Documentation/admin-guide/mm/hugetlbpage.rst
``/proc/sys/vm/nr_hugepages`` indicates the current number of
"persistent" huge
pages in the kernel's huge page pool. "Persistent" huge pages will be
returned to the huge page pool when freed by a task. A user with root
privileges can dynamically allocate more or free some persistent huge pages
by increasing or decreasing the value of ``nr_hugepages``.
Actually, Documentation/admin-guide/mm/hugetlbpage.rst is contradictory
about the definition of /proc/sys/vm/nr_hugepages.
The documentation says:
- ``/proc/sys/vm/nr_hugepages`` indicates the current number of
"persistent" huge.
But, the documentation also says:
- The ``/proc`` interfaces discussed above have been retained for
backwards compatibility.
Yes. And why shouldn't the behavior of these toggles be retained for backwards compatibility?
- The ``nr_hugepages`` attribute returns the total number of huge pages on
the specified node. When this attribute is written, the number of
persistent huge pages on the parent node will be adjusted to the specified
value, if sufficient resources exist, regardless of the task's mempolicy
or cpuset constraints.
That is part of the "/sys/devices/system/node/node[0-9]*/hugepages/" docuemntation,
not "/proc/sys/vm/nr_hugepages" documentation.
It's in the "Per Node Hugepages Attributes" section.
If I am not wrong, that documentation -- including the usage of the "persistent"
term -- were introduced in 2009 already:
commit 267b4c281b4a43c8f3d965c791d3a7fd62448733
Author: Lee Schermerhorn <lee.schermerhorn@xxxxxx>
Date: Mon Dec 14 17:58:30 2009 -0800
hugetlb: update hugetlb documentation for NUMA controls
Update the kernel huge tlb documentation to describe the numa memory
policy based huge page management. Additionaly, the patch includes a fair
amount of rework to improve consistency, eliminate duplication and set the
context for documenting the memory policy interaction.
So this has been documented behavior for a long time.
So, I create the patch 4 to make the documentation more clear.
If such subtle inconsistencies result in unexpected behavior, it can be
challenging for a system administrator to detect.
Your patch is changing the traditional, documented behavior to then change
the documentation to match the new implementation.
How can you be sure nobody relies on exactly that traditional, well documented
behavior?
Why change the behavior of an interface that is kept for backwards compatibility,
that might still be in use under the assumptions that the behavior is exactly
like that (and for at least the last 14 years)?
--
Cheers,
David / dhildenb