On Mon, Aug 10, 2015 at 05:37:54PM -0700, David Rientjes wrote: > On Fri, 7 Aug 2015, Naoya Horiguchi wrote: > > > Currently smaps reports many zero fields for vma(VM_HUGETLB), which is > > inconvenient when we want to know per-task or per-vma base hugetlb usage. > > This patch enables these fields by introducing smaps_hugetlb_range(). > > > > before patch: > > > > Size: 20480 kB > > Rss: 0 kB > > Pss: 0 kB > > Shared_Clean: 0 kB > > Shared_Dirty: 0 kB > > Private_Clean: 0 kB > > Private_Dirty: 0 kB > > Referenced: 0 kB > > Anonymous: 0 kB > > AnonHugePages: 0 kB > > Swap: 0 kB > > KernelPageSize: 2048 kB > > MMUPageSize: 2048 kB > > Locked: 0 kB > > VmFlags: rd wr mr mw me de ht > > > > after patch: > > > > Size: 20480 kB > > Rss: 18432 kB > > Pss: 18432 kB > > Shared_Clean: 0 kB > > Shared_Dirty: 0 kB > > Private_Clean: 0 kB > > Private_Dirty: 18432 kB > > Referenced: 18432 kB > > Anonymous: 18432 kB > > AnonHugePages: 0 kB > > Swap: 0 kB > > KernelPageSize: 2048 kB > > MMUPageSize: 2048 kB > > Locked: 0 kB > > VmFlags: rd wr mr mw me de ht > > > > I think this will lead to breakage, unfortunately, specifically for users > who are concerned with resource management. > > An example: we use memcg hierarchies to charge memory for individual jobs, > specific users, and system overhead. Memcg is a cgroup, so this is done > for an aggregate of processes, and we often have to monitor their memory > usage. Each process isn't assigned to its own memcg, and I don't believe > common users of memcg assign individual processes to their own memcgs. > > When a memcg is out of memory, we need to track the memory usage of > processes attached to its memcg hierarchy to determine what is unexpected, > either as a result of a new rollout or because of a memory leak. To do > that, we use the rss exported by smaps that is now changed with this > patch. By using smaps rather than /proc/pid/status, we can report where > memory usage is unexpected. > > This would cause our process that manages all memcgs on our systems to > break. Perhaps I haven't been as convincing in my previous messages of > this, but it's quite an obvious userspace regression. OK, this version assumes that userspace distinguishes vma(VM_HUGETLB) with "VmFlags" field, which is unrealistic. So I'll keep all existing fields untouched by introducing hugetlb usage info. > This memory was not included in rss originally because memory in the > hugetlb persistent pool is always resident. Unmapping the memory does not > free memory. For this reason, hugetlb memory has always been treated as > its own type of memory. Right, so it might be better not to use the word "RSS" for hugetlb, maybe something like "HugetlbPages:" seems better to me. Thanks, Naoya Horiguchi > It would have been arguable back when hugetlbfs was introduced whether it > should be included. I'm afraid the ship has sailed on that since a decade > has past and it would cause userspace to break if existing metrics are > used that already have cleared defined semantics.��.n������g����a����&ޖ)���)��h���&������梷�����Ǟ�m������)������^�����������v���O��zf������