On Thu, Nov 2, 2023 at 6:07 PM Pasha Tatashin <pasha.tatashin@xxxxxxxxxx> wrote: > > On Thu, Nov 2, 2023 at 4:22 PM Wei Xu <weixugc@xxxxxxxxxx> wrote: > > > > On Thu, Nov 2, 2023 at 11:34 AM Pasha Tatashin > > <pasha.tatashin@xxxxxxxxxx> wrote: > > > > > > > > > I could have sworn that I pointed that out in a previous version and > > > > > > requested to document that special case in the patch description. :) > > > > > > > > > > Sounds, good we will document that parts of per-page may not be part > > > > > of MemTotal. > > > > > > > > But this still doesn't answer how we can use the new PageMetadata > > > > field to help break down the runtime kernel overhead within MemUsed > > > > (MemTotal - MemFree). > > > > > > I am not sure it matters to the end users: they look at PageMetadata > > > with or without Page Owner, page_table_check, HugeTLB and it shows > > > exactly how much per-page overhead changed. Where the kernel allocated > > > that memory is not that important to the end user as long as that > > > memory became available to them. > > > > > > In addition, it is still possible to estimate the actual memblock part > > > of Per-page metadata by looking at /proc/zoneinfo: > > > > > > Memblock reserved per-page metadata: "present_pages - managed_pages" > > > > This assumes that all reserved memblocks are per-page metadata. As I > > Right after boot, when all Per-page metadata is still from memblocks, > we could determine what part of the zone reserved memory is not > per-page, and use it later in our calculations. > > > mentioned earlier, it is not a robust approach. > > > If there is something big that we will allocate in that range, we > > > should probably also export it in some form. > > > > > > If this field does not fit in /proc/meminfo due to not fully being > > > part of MemTotal, we could just keep it under nodeN/, as a separate > > > file, as suggested by Greg. > > > > > > However, I think it is useful enough to have an easy system wide view > > > for Per-page metadata. > > > > It is fine to have this as a separate, informational sysfs file under > > nodeN/, outside of meminfo. I just don't think as in the current > > implementation (where PageMetadata is a mixture of buddy and memblock > > allocations), it can help with the use case that motivates this > > change, i.e. to improve the breakdown of the kernel overhead. > > > > > > > are allocated), so what would be the best way to export page metadata > > > > > > > without redefining MemTotal? Keep the new field in /proc/meminfo but > > > > > > > be ok that it is not part of MemTotal or do two counters? If we do two > > > > > > > counters, we will still need to keep one that is a buddy allocator in > > > > > > > /proc/meminfo and the other one somewhere outside? > > > > > > > > > > > > > > I think the simplest thing to do now is to only report the buddy > > > > allocations of per-page metadata in meminfo. The meaning of the new > > > > > > This will cause PageMetadata to be 0 on 99% of the systems, and > > > essentially become useless to the vast majority of users. > > > > I don't think it is a major issue. There are other fields (e.g. Zswap) > > in meminfo that remain 0 when the feature is not used. > > Since we are going to use two independent interfaces > /proc/meminfo/PageMetadata and nodeN/page_metadata (in a separate file > as requested by Greg) How about if in /proc/meminfo we provide only > the buddy allocator part, and in nodeN/page_metadata we provide the > total per-page overhead in the given node that include memblock > reserves, and buddy allocator memory? What we want is the system-wide breakdown of kernel memory usage. It works for this use case with the new PageMetadata counter in /proc/meminfo to report only buddy-allocated per-page metadata. > Pasha