On Fri, May 8, 2020 at 6:38 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > On Fri, May 08, 2020 at 06:25:14AM -0700, Shakeel Butt wrote: > > On Fri, May 8, 2020 at 3:34 AM Yafang Shao <laoar.shao@xxxxxxxxx> wrote: > > > > > > On Fri, May 8, 2020 at 4:49 AM Shakeel Butt <shakeelb@xxxxxxxxxx> wrote: > > > > > > > > One way to measure the efficiency of memory reclaim is to look at the > > > > ratio (pgscan+pfrefill)/pgsteal. However at the moment these stats are > > > > not updated consistently at the system level and the ratio of these are > > > > not very meaningful. The pgsteal and pgscan are updated for only global > > > > reclaim while pgrefill gets updated for global as well as cgroup > > > > reclaim. > > > > > > > > > > Hi Shakeel, > > > > > > We always use pgscan and pgsteal for monitoring the system level > > > memory pressure, for example, by using sysstat(sar) or some other > > > monitor tools. > > I'm in the same boat. It's useful to have activity that happens purely > due to machine capacity rather than localized activity that happens > due to the limits throughout the cgroup tree. > > > Don't you need pgrefill in addition to pgscan and pgsteal to get the > > full picture of the reclaim activity? > > I actually almost never look at pgrefill. > Nowadays we are looking at reclaim cost on high utilization machines/devices and noticed that rmap walk takes more than 60/70% of the CPU cost of the reclaim. Kernel does rmap walks in shrink_active_list and shrink_page_list and pgscan and pgrefill are good approximations of the number of rmap walks during a reclaim. > > > But with this change, these two counters include the memcg pressure as > > > well. It is not easy to know whether the pgscan and pgsteal are caused > > > by system level pressure or only some specific memcgs reaching their > > > memory limit. > > > > > > How about adding cgroup_reclaim() to pgrefill as well ? > > > > > > > I am looking for all the reclaim activity on the system. Adding > > !cgroup_reclaim to pgrefill will skip the cgroup reclaim activity. > > Maybe adding pgsteal_cgroup and pgscan_cgroup would be better. > > How would you feel about adding memory.stat at the root cgroup level? > Actually I would prefer adding memory.stat at the root cgroup level as you noted below that more use-cases would benefit from it. > There are subtle differences between /proc/vmstat and memory.stat, and > cgroup-aware code that wants to watch the full hierarchy currently has > to know about these intricacies and translate semantics back and forth. > > Generally having the fully recursive memory.stat at the root level > could help a broader range of usecases. Thanks for the feedback. I will send the patch with the additional motivation.