On Tue, Feb 20, 2024 at 4:46 PM Sourav Panda <souravpanda@xxxxxxxxxx> wrote: > > Adds two new per-node fields, namely nr_memmap and nr_memmap_boot, > to /sys/devices/system/node/nodeN/vmstat and a global Memmap field > to /proc/meminfo. This information can be used by users to see how > much memory is being used by per-page metadata, which can vary > depending on build configuration, machine architecture, and system > use. > > Per-page metadata is the amount of memory that Linux needs in order to > manage memory at the page granularity. The majority of such memory is > used by "struct page" and "page_ext" data structures. In contrast to > most other memory consumption statistics, per-page metadata might not > be included in MemTotal. For example, MemTotal does not include memblock > allocations but includes buddy allocations. In this patch, exported > field nr_memmap in /sys/devices/system/node/nodeN/vmstat would > exclusively track buddy allocations while nr_memmap_boot would > exclusively track memblock allocations. Furthermore, Memmap in > /proc/meminfo would exclusively track buddy allocations allowing it to > be compared against MemTotal. > > This memory depends on build configurations, machine architectures, and > the way system is used: > > Build configuration may include extra fields into "struct page", > and enable / disable "page_ext" > Machine architecture defines base page sizes. For example 4K x86, > 8K SPARC, 64K ARM64 (optionally), etc. The per-page metadata > overhead is smaller on machines with larger page sizes. > System use can change per-page overhead by using vmemmap > optimizations with hugetlb pages, and emulated pmem devdax pages. > Also, boot parameters can determine whether page_ext is needed > to be allocated. This memory can be part of MemTotal or be outside > MemTotal depending on whether the memory was hot-plugged, booted with, > or hugetlb memory was returned back to the system. > > Utility for userspace: > > Application Optimization: Depending on the kernel version and command > line options, the kernel would relinquish a different number of pages > (that contain struct pages) when a hugetlb page is reserved (e.g., 0, 6 > or 7 for a 2MB hugepage). The userspace application would want to know > the exact savings achieved through page metadata deallocation without > dealing with the intricacies of the kernel. > > Observability: Struct page overhead can only be calculated on-paper at > boot time (e.g., 1.5% machine capacity). Beyond boot once hugepages are > reserved or memory is hotplugged, the computation becomes complex. > Per-page metrics will help explain part of the system memory overhead, > which shall help guide memory optimizations and memory cgroup sizing. > > Debugging: Tracking the changes or absolute value in struct pages can > help detect anomalies as they can be correlated with other metrics in > the machine (e.g., memtotal, number of huge pages, etc). > > page_ext overheads: Some kernel features such as page_owner > page_table_check that use page_ext can be optionally enabled via kernel > parameters. Having the total per-page metadata information helps users > precisely measure impact. > > Suggested-by: Pasha Tatashin <pasha.tatashin@xxxxxxxxxx> > Signed-off-by: Sourav Panda <souravpanda@xxxxxxxxxx> Reviewed-by: Pasha Tatashin <pasha.tatashin@xxxxxxxxxx>