On Fri, Jan 4, 2019 at 4:21 PM Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx> wrote: > > We have some usecases which create and remove memcgs very frequently, > and the tasks in the memcg may just access the files which are unlikely > accessed by anyone else. So, we prefer force_empty the memcg before > rmdir'ing it to reclaim the page cache so that they don't get > accumulated to incur unnecessary memory pressure. Since the memory > pressure may incur direct reclaim to harm some latency sensitive > applications. > > Force empty would help out such usecase, however force empty reclaims > memory synchronously when writing to memory.force_empty. It may take > some time to return and the afterwards operations are blocked by it. > Although this can be done in background, some usecases may need create > new memcg with the same name right after the old one is deleted. So, > the creation might get blocked by the before reclaim/remove operation. > > Delaying memory reclaim in cgroup offline for such usecase sounds > reasonable. Introduced a new interface, called wipe_on_offline for both > default and legacy hierarchy, which does memory reclaim in css offline > kworker. > > Writing to 1 would enable it, writing 0 would disable it. > > Suggested-by: Michal Hocko <mhocko@xxxxxxxx> > Cc: Johannes Weiner <hannes@xxxxxxxxxxx> > Signed-off-by: Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx> > --- > include/linux/memcontrol.h | 3 +++ > mm/memcontrol.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 52 insertions(+) > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 83ae11c..2f1258a 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -311,6 +311,9 @@ struct mem_cgroup { > struct list_head event_list; > spinlock_t event_list_lock; > > + /* Reclaim as much as possible memory in offline kworker */ > + bool wipe_on_offline; > + > struct mem_cgroup_per_node *nodeinfo[0]; > /* WARNING: nodeinfo must be the last member here */ > }; > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 75208a2..5a13c6b 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -2918,6 +2918,35 @@ static ssize_t mem_cgroup_force_empty_write(struct kernfs_open_file *of, > return mem_cgroup_force_empty(memcg) ?: nbytes; > } > > +static int wipe_on_offline_show(struct seq_file *m, void *v) > +{ > + struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m)); > + > + seq_printf(m, "%lu\n", (unsigned long)memcg->wipe_on_offline); > + > + return 0; > +} > + > +static int wipe_on_offline_write(struct cgroup_subsys_state *css, > + struct cftype *cft, u64 val) > +{ > + int ret = 0; > + > + struct mem_cgroup *memcg = mem_cgroup_from_css(css); > + > + if (mem_cgroup_is_root(memcg)) > + return -EINVAL; > + > + if (val == 0) > + memcg->wipe_on_offline = false; > + else if (val == 1) > + memcg->wipe_on_offline = true; > + else > + ret = -EINVAL; > + > + return ret; > +} > + > static u64 mem_cgroup_hierarchy_read(struct cgroup_subsys_state *css, > struct cftype *cft) > { > @@ -4283,6 +4312,11 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of, > .write = mem_cgroup_reset, > .read_u64 = mem_cgroup_read_u64, > }, > + { > + .name = "wipe_on_offline", What about "force_empty_on_offline"? > + .seq_show = wipe_on_offline_show, > + .write_u64 = wipe_on_offline_write, > + }, > { }, /* terminate */ > }; > > @@ -4569,6 +4603,15 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css) > page_counter_set_min(&memcg->memory, 0); > page_counter_set_low(&memcg->memory, 0); > > + /* > + * Reclaim as much as possible memory when offlining. > + * > + * Do it after min/low is reset otherwise some memory might > + * be protected by min/low. > + */ > + if (memcg->wipe_on_offline) > + mem_cgroup_force_empty(memcg); > + mem_cgroup_force_empty() also does drain_all_stock(), so, move drain_all_stock() in mem_cgroup_css_offline() to the else of 'if (memcg->wipe_on_offline)'. > memcg_offline_kmem(memcg); > wb_memcg_offline(memcg); > > @@ -5694,6 +5737,12 @@ static ssize_t memory_oom_group_write(struct kernfs_open_file *of, > .seq_show = memory_oom_group_show, > .write = memory_oom_group_write, > }, > + { > + .name = "wipe_on_offline", > + .flags = CFTYPE_NOT_ON_ROOT, > + .seq_show = wipe_on_offline_show, > + .write_u64 = wipe_on_offline_write, > + }, > { } /* terminate */ > }; > > -- > 1.8.3.1 >