> On Nov 27, 2019, at 11:17 PM, Yafang Shao <laoar.shao@xxxxxxxxx> wrote: > > When there are no more processes in a memcg (e.g., due to OOM > group), we can still have file pages in the page cache. > > If these pages are protected by memory.min, they can't be reclaimed. > Especially if there won't be another process in this memcg and the memcg > is kept online, we do want to drop these pages from the page cache. > > By dropping these page caches we can avoid reclaimers (e.g., kswapd or > direct) to scan and reclaim pages from all memcgs in the system - > because the reclaimers will try to fairly reclaim pages from all memcgs > in the system when under memory pressure. > > By setting the hard limit of such a memcg to 0, we allow to drop the > page cache of such memcgs. Unfortunately, this may invoke the OOM killer > and generate a lot of output. The OOM output is not expected by an admin > who wants to drop these caches and knows that there are no processes in > this memcg anymore. > > Therefore, if a memcg is not populated, we should not invoke the OOM > killer - there is nothing to kill. The next time a new process is > started in the memcg and the "max" is still below usage, the OOM killer > will be invoked and the new process will be killed. > > [ Above commit log is contributed by David ] > > What's worse about this issue is that when there're killable tasks and the > OOM killer killed the last task, and what will happen then ? As nr_reclaims > is already 0 and drained is alreay true, the OOM killer will try to kill > nothing (because he knows he has killed the last task), what's a stupid > behavior. > > Someone may worry that the admins may not going to see that the memcg was > OOM due to the limit change. But this is not a issue, because the admins > changes the limit and then the admins must check the result of his change > - by checking memory.{max, current, stat} he can get all he wants. > > Cc: David Hildenbrand <david@xxxxxxxxxx> > Nacked-by: Michal Hocko <mhocko@xxxxxxxx> > Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx> Surely too big a turkey to swallow ? — unprofessional wording and carrying a NACK from one of the maintainers. > > --- > Changes since v2: Refresh the subject and commit log. The original > subject is "mm, memcg: avoid oom if cgroup is not populated" > --- > mm/memcontrol.c | 15 +++++++++++++-- > 1 file changed, 13 insertions(+), 2 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 1c4c08b..e936f1b 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -6139,9 +6139,20 @@ static ssize_t memory_max_write(struct kernfs_open_file *of, > continue; > } > > - memcg_memory_event(memcg, MEMCG_OOM); > - if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0)) > + /* If there's no procesess, we don't need to invoke the OOM > + * killer. Then next time when you try to start a process > + * in this memcg, the max may still bellow usage, and then > + * this OOM killer will be invoked. This can be considered > + * as lazy OOM, that is we have been always doing in the > + * kernel. Pls. Michal, that is really consistency. > + */ > + if (cgroup_is_populated(memcg->css.cgroup)) { > + memcg_memory_event(memcg, MEMCG_OOM); > + if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0)) > + break; > + } else { > break; > + } > } > > memcg_wb_domain_size_changed(memcg); > -- > 1.8.3.1 > >