On Wed, Mar 13, 2024 at 6:40 PM Pasha Tatashin <pasha.tatashin@xxxxxxxxxx> wrote: > > On Tue, Feb 20, 2024 at 4:46 PM Sourav Panda <souravpanda@xxxxxxxxxx> wrote: > > > > Adds two new per-node fields, namely nr_memmap and nr_memmap_boot, > > to /sys/devices/system/node/nodeN/vmstat and a global Memmap field > > to /proc/meminfo. This information can be used by users to see how > > much memory is being used by per-page metadata, which can vary > > depending on build configuration, machine architecture, and system > > use. > > > > Per-page metadata is the amount of memory that Linux needs in order to > > manage memory at the page granularity. The majority of such memory is > > used by "struct page" and "page_ext" data structures. In contrast to > > most other memory consumption statistics, per-page metadata might not > > be included in MemTotal. For example, MemTotal does not include memblock > > allocations but includes buddy allocations. In this patch, exported > > field nr_memmap in /sys/devices/system/node/nodeN/vmstat would > > exclusively track buddy allocations while nr_memmap_boot would > > exclusively track memblock allocations. Furthermore, Memmap in > > /proc/meminfo would exclusively track buddy allocations allowing it to > > be compared against MemTotal. > > > > This memory depends on build configurations, machine architectures, and > > the way system is used: > > > > Build configuration may include extra fields into "struct page", > > and enable / disable "page_ext" > > Machine architecture defines base page sizes. For example 4K x86, > > 8K SPARC, 64K ARM64 (optionally), etc. The per-page metadata > > overhead is smaller on machines with larger page sizes. > > System use can change per-page overhead by using vmemmap > > optimizations with hugetlb pages, and emulated pmem devdax pages. > > Also, boot parameters can determine whether page_ext is needed > > to be allocated. This memory can be part of MemTotal or be outside > > MemTotal depending on whether the memory was hot-plugged, booted with, > > or hugetlb memory was returned back to the system. > > > > Utility for userspace: > > > > Application Optimization: Depending on the kernel version and command > > line options, the kernel would relinquish a different number of pages > > (that contain struct pages) when a hugetlb page is reserved (e.g., 0, 6 > > or 7 for a 2MB hugepage). The userspace application would want to know > > the exact savings achieved through page metadata deallocation without > > dealing with the intricacies of the kernel. > > > > Observability: Struct page overhead can only be calculated on-paper at > > boot time (e.g., 1.5% machine capacity). Beyond boot once hugepages are > > reserved or memory is hotplugged, the computation becomes complex. > > Per-page metrics will help explain part of the system memory overhead, > > which shall help guide memory optimizations and memory cgroup sizing. > > > > Debugging: Tracking the changes or absolute value in struct pages can > > help detect anomalies as they can be correlated with other metrics in > > the machine (e.g., memtotal, number of huge pages, etc). > > > > page_ext overheads: Some kernel features such as page_owner > > page_table_check that use page_ext can be optionally enabled via kernel > > parameters. Having the total per-page metadata information helps users > > precisely measure impact. Hi Andrew, Can you please give this patch another look, does it require more reviews before you can take it in? Thank you, Pasha