On Wed, Feb 13, 2019 at 4:47 AM Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > On Tue 12-02-19 14:45:42, Andrew Morton wrote: > [...] > > From: Chris Down <chris@xxxxxxxxxxxxxx> > > Subject: mm, memcg: consider subtrees in memory.events > > > > memory.stat and other files already consider subtrees in their output, and > > we should too in order to not present an inconsistent interface. > > > > The current situation is fairly confusing, because people interacting with > > cgroups expect hierarchical behaviour in the vein of memory.stat, > > cgroup.events, and other files. For example, this causes confusion when > > debugging reclaim events under low, as currently these always read "0" at > > non-leaf memcg nodes, which frequently causes people to misdiagnose breach > > behaviour. The same confusion applies to other counters in this file when > > debugging issues. > > > > Aggregation is done at write time instead of at read-time since these > > counters aren't hot (unlike memory.stat which is per-page, so it does it > > at read time), and it makes sense to bundle this with the file I think the above statement (memory.stat read-time aggregation) need to be fixed due to the latest changes. > > notifications. > > > > After this patch, events are propagated up the hierarchy: > > > > [root@ktst ~]# cat /sys/fs/cgroup/system.slice/memory.events > > low 0 > > high 0 > > max 0 > > oom 0 > > oom_kill 0 > > [root@ktst ~]# systemd-run -p MemoryMax=1 true > > Running as unit: run-r251162a189fb4562b9dabfdc9b0422f5.service > > [root@ktst ~]# cat /sys/fs/cgroup/system.slice/memory.events > > low 0 > > high 0 > > max 7 > > oom 1 > > oom_kill 1 > > > > As this is a change in behaviour, this can be reverted to the old > > behaviour by mounting with the `memory_localevents' flag set. However, we > > use the new behaviour by default as there's a lack of evidence that there > > are any current users of memory.events that would find this change > > undesirable. > > > > Link: http://lkml.kernel.org/r/20190208224419.GA24772@xxxxxxxxxxxxxx > > Signed-off-by: Chris Down <chris@xxxxxxxxxxxxxx> > > Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx> Reviewed-by: Shakeel Butt <shakeelb@xxxxxxxxxx> However can we please have memory.events.local merged along with this one? > > Cc: Michal Hocko <mhocko@xxxxxxxxxx> > > Cc: Tejun Heo <tj@xxxxxxxxxx> > > Cc: Roman Gushchin <guro@xxxxxx> > > Cc: Dennis Zhou <dennis@xxxxxxxxxx> > > Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > > FTR: As I've already said here [1] I can live with this change as long > as there is a larger consensus among cgroup v2 users. So let's give this > some more time before merging to see whether there is such a consensus. > > [1] http://lkml.kernel.org/r/20190201102515.GK11599@xxxxxxxxxxxxxx > > > --- > > > > Documentation/admin-guide/cgroup-v2.rst | 9 +++++++++ > > include/linux/cgroup-defs.h | 5 +++++ > > include/linux/memcontrol.h | 10 ++++++++-- > > kernel/cgroup/cgroup.c | 16 ++++++++++++++-- > > 4 files changed, 36 insertions(+), 4 deletions(-) > > > > --- a/Documentation/admin-guide/cgroup-v2.rst~mm-consider-subtrees-in-memoryevents > > +++ a/Documentation/admin-guide/cgroup-v2.rst > > @@ -177,6 +177,15 @@ cgroup v2 currently supports the followi > > ignored on non-init namespace mounts. Please refer to the > > Delegation section for details. > > > > + memory_localevents > > + > > + Only populate memory.events with data for the current cgroup, > > + and not any subtrees. This is legacy behaviour, the default > > + behaviour without this option is to include subtree counts. > > + This option is system wide and can only be set on mount or > > + modified through remount from the init namespace. The mount > > + option is ignored on non-init namespace mounts. > > + > > > > Organizing Processes and Threads > > -------------------------------- > > --- a/include/linux/cgroup-defs.h~mm-consider-subtrees-in-memoryevents > > +++ a/include/linux/cgroup-defs.h > > @@ -83,6 +83,11 @@ enum { > > * Enable cpuset controller in v1 cgroup to use v2 behavior. > > */ > > CGRP_ROOT_CPUSET_V2_MODE = (1 << 4), > > + > > + /* > > + * Enable legacy local memory.events. > > + */ > > + CGRP_ROOT_MEMORY_LOCAL_EVENTS = (1 << 5), > > }; > > > > /* cftype->flags */ > > --- a/include/linux/memcontrol.h~mm-consider-subtrees-in-memoryevents > > +++ a/include/linux/memcontrol.h > > @@ -789,8 +789,14 @@ static inline void count_memcg_event_mm( > > static inline void memcg_memory_event(struct mem_cgroup *memcg, > > enum memcg_memory_event event) > > { > > - atomic_long_inc(&memcg->memory_events[event]); > > - cgroup_file_notify(&memcg->events_file); > > + do { > > + atomic_long_inc(&memcg->memory_events[event]); > > + cgroup_file_notify(&memcg->events_file); > > + > > + if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS) > > + break; > > + } while ((memcg = parent_mem_cgroup(memcg)) && > > + !mem_cgroup_is_root(memcg)); > > } > > > > static inline void memcg_memory_event_mm(struct mm_struct *mm, > > --- a/kernel/cgroup/cgroup.c~mm-consider-subtrees-in-memoryevents > > +++ a/kernel/cgroup/cgroup.c > > @@ -1775,11 +1775,13 @@ int cgroup_show_path(struct seq_file *sf > > > > enum cgroup2_param { > > Opt_nsdelegate, > > + Opt_memory_localevents, > > nr__cgroup2_params > > }; > > > > static const struct fs_parameter_spec cgroup2_param_specs[] = { > > - fsparam_flag ("nsdelegate", Opt_nsdelegate), > > + fsparam_flag("nsdelegate", Opt_nsdelegate), > > + fsparam_flag("memory_localevents", Opt_memory_localevents), > > {} > > }; > > > > @@ -1802,6 +1804,9 @@ static int cgroup2_parse_param(struct fs > > case Opt_nsdelegate: > > ctx->flags |= CGRP_ROOT_NS_DELEGATE; > > return 0; > > + case Opt_memory_localevents: > > + ctx->flags |= CGRP_ROOT_MEMORY_LOCAL_EVENTS; > > + return 0; > > } > > return -EINVAL; > > } > > @@ -1813,6 +1818,11 @@ static void apply_cgroup_root_flags(unsi > > cgrp_dfl_root.flags |= CGRP_ROOT_NS_DELEGATE; > > else > > cgrp_dfl_root.flags &= ~CGRP_ROOT_NS_DELEGATE; > > + > > + if (root_flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS) > > + cgrp_dfl_root.flags |= CGRP_ROOT_MEMORY_LOCAL_EVENTS; > > + else > > + cgrp_dfl_root.flags &= ~CGRP_ROOT_MEMORY_LOCAL_EVENTS; > > } > > } > > > > @@ -1820,6 +1830,8 @@ static int cgroup_show_options(struct se > > { > > if (cgrp_dfl_root.flags & CGRP_ROOT_NS_DELEGATE) > > seq_puts(seq, ",nsdelegate"); > > + if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS) > > + seq_puts(seq, ",memory_localevents"); > > return 0; > > } > > > > @@ -6207,7 +6219,7 @@ static struct kobj_attribute cgroup_dele > > static ssize_t features_show(struct kobject *kobj, struct kobj_attribute *attr, > > char *buf) > > { > > - return snprintf(buf, PAGE_SIZE, "nsdelegate\n"); > > + return snprintf(buf, PAGE_SIZE, "nsdelegate\nmemory_localevents\n"); > > } > > static struct kobj_attribute cgroup_features_attr = __ATTR_RO(features); > > > > _ > > > > Patches currently in -mm which might be from chris@xxxxxxxxxxxxxx are > > > > mm-create-mem_cgroup_from_seq.patch > > mm-extract-memcg-maxable-seq_file-logic-to-seq_show_memcg_tunable.patch > > mm-proportional-memorylowmin-reclaim.patch > > mm-proportional-memorylowmin-reclaim-fix.patch > > mm-memcontrol-expose-thp-events-on-a-per-memcg-basis.patch > > mm-memcontrol-expose-thp-events-on-a-per-memcg-basis-fix-2.patch > > mm-make-memoryemin-the-baseline-for-utilisation-determination.patch > > mm-rename-ambiguously-named-memorystat-counters-and-functions.patch > > mm-consider-subtrees-in-memoryevents.patch > > -- > Michal Hocko > SUSE Labs