Re: [patch] mm, oom: prevent soft lockup on memcg oom for UP systems

Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> · Tue, 17 Mar 2020 12:18:51 +0900

David Rientjes wrote:
> On Sat, 14 Mar 2020, Tetsuo Handa wrote:
> > If current thread is an OOM victim, schedule_timeout_killable(1) will give other
> > threads (including the OOM reaper kernel thread) CPU time to run, by leaving
> > try_charge() path due to should_force_charge() == true and reaching do_exit() path
> > instead of returning to userspace code doing "for (;;);".
> > 
> > Unless the problem is that current thread cannot reach should_force_charge() check,
> > schedule_timeout_killable(1) should work.
> > 
> 
> No need to yield if current is the oom victim, allowing the oom reaper to 
> run when it may not actually be able to free memory is not required.  It 
> increases the likelihood that some other process schedules and is unable 
> to yield back due to the memcg oom condition such that the victim doesn't 
> get a chance to run again.
> 
> This happens because the victim is allowed to overcharge but other 
> processes attached to an oom memcg hierarchy simply fail the charge.  We 
> are then reliant on all memory chargers in the kernel to yield if their 
> charges fail due to oom.  It's the only way to allow the victim to 
> eventually run.
> 
> So the only change that I would make to your patch is to do this in 
> mem_cgroup_out_of_memory() instead:
> 
> 	if (!fatal_signal_pending(current))
> 		schedule_timeout_killable(1);
> 
> So we don't have this reliance on all other memory chargers to yield when 
> their charge fails and there is no delay for victims themselves.

I see. You want below functions for environments where current thread can
fail to resume execution for long if current thread once reschedules (e.g.
UP kernel, many threads contended on one CPU).

/*
 * Give other threads CPU time, unless current thread was already killed.
 * Used when we prefer killed threads to continue execution (in a hope that
 * killed threads terminate quickly) over giving other threads CPU time.
 */
signed long __sched schedule_timeout_killable_expedited(signed long timeout)
{
	if (unlikely(fatal_signal_pending(current)))
		return timeout;
	return schedule_timeout_killable(timeout);
}

/*
 * Latency reduction via explicit rescheduling in places that are safe,
 * but becomes no-op if current thread was already killed. Used when we
 * prefer killed threads to continue execution (in a hope that killed
 * threads terminate quickly) over giving other threads CPU time.
 */
int cond_resched_expedited(void)
{
	if (unlikely(fatal_signal_pending(current)))
		return 0;
	return cond_resched();
}

> 
>  [ I'll still propose my change that adds cond_resched() to 
>    shrink_node_memcgs() because we can see need_resched set for a 
>    prolonged period of time without scheduling. ]

As long as there is schedule_timeout_killable(), I'm fine with adding
cond_resched() in other places.

> 
> If you agree, I'll propose your patch with a changelog that indicates it 
> can fix the soft lockup issue for UP and can likely get a tested-by for 
> it.
> 

Please go ahead.