Re: [PATCH v3] mm, memcg: fix the stupid OOM killer when shrinking memcg hard limit

Qian Cai <cai@xxxxxx> · Wed, 27 Nov 2019 23:28:39 -0500

> On Nov 27, 2019, at 11:17 PM, Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
> 
> When there are no more processes in a memcg (e.g., due to OOM
> group), we can still have file pages in the page cache.
> 
> If these pages are protected by memory.min, they can't be reclaimed.
> Especially if there won't be another process in this memcg and the memcg
> is kept online, we do want to drop these pages from the page cache.
> 
> By dropping these page caches we can avoid reclaimers (e.g., kswapd or
> direct) to scan and reclaim pages from all memcgs in the system -
> because the reclaimers will try to fairly reclaim pages from all memcgs
> in the system when under memory pressure.
> 
> By setting the hard limit of such a memcg to 0, we allow to drop the
> page cache of such memcgs. Unfortunately, this may invoke the OOM killer
> and generate a lot of output. The OOM output is not expected by an admin
> who wants to drop these caches and knows that there are no processes in
> this memcg anymore.
> 
> Therefore, if a memcg is not populated, we should not invoke the OOM
> killer - there is nothing to kill. The next time a new process is
> started in the memcg and the "max" is still below usage, the OOM killer
> will be invoked and the new process will be killed.
> 
> [ Above commit log is contributed by David ]
> 
> What's worse about this issue is that when there're killable tasks and the
> OOM killer killed the last task, and what will happen then ? As nr_reclaims
> is already 0 and drained is alreay true, the OOM killer will try to kill
> nothing (because he knows he has killed the last task), what's a stupid
> behavior.
> 
> Someone may worry that the admins may not going to see that the memcg was
> OOM due to the limit change. But this is not a issue, because the admins
> changes the limit and then the admins must check the result of his change
> - by checking memory.{max, current, stat} he can get all he wants.
> 
> Cc: David Hildenbrand <david@xxxxxxxxxx>
> Nacked-by: Michal Hocko <mhocko@xxxxxxxx>
> Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx>

Surely too big a turkey to swallow ? — unprofessional wording and carrying a NACK
from one of the maintainers.

> 
> ---
> Changes since v2: Refresh the subject and commit log. The original
> subject is "mm, memcg: avoid oom if cgroup is not populated"
> ---
> mm/memcontrol.c | 15 +++++++++++++--
> 1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 1c4c08b..e936f1b 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6139,9 +6139,20 @@ static ssize_t memory_max_write(struct kernfs_open_file *of,
> 			continue;
> 		}
> 
> -		memcg_memory_event(memcg, MEMCG_OOM);
> -		if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0))
> +		/* If there's no procesess, we don't need to invoke the OOM
> +		 * killer. Then next time when you try to start a process
> +		 * in this memcg, the max may still bellow usage, and then
> +		 * this OOM killer will be invoked. This can be considered
> +		 * as lazy OOM, that is we have been always doing in the
> +		 * kernel. Pls. Michal, that is really consistency.
> +		 */
> +		if (cgroup_is_populated(memcg->css.cgroup)) {
> +			memcg_memory_event(memcg, MEMCG_OOM);
> +			if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0))
> +				break;
> +		} else  {
> 			break;
> +		}
> 	}
> 
> 	memcg_wb_domain_size_changed(memcg);
> -- 
> 1.8.3.1
> 
>