Re: [patch] mm, oom: prevent soft lockup on memcg oom for UP systems

Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> · Thu, 12 Mar 2020 07:04:20 +0900

On 2020/03/12 4:38, David Rientjes wrote:
> On Wed, 11 Mar 2020, Tetsuo Handa wrote:
> 
>>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>>> --- a/mm/vmscan.c
>>>>> +++ b/mm/vmscan.c
>>>>> @@ -2637,6 +2637,8 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
>>>>>  		unsigned long reclaimed;
>>>>>  		unsigned long scanned;
>>>>>  
>>>>> +		cond_resched();
>>>>> +
>>>>
>>>> Is this safe for CONFIG_PREEMPTION case? If current thread has realtime priority,
>>>> can we guarantee that the OOM victim (well, the OOM reaper kernel thread rather
>>>> than the OOM victim ?) gets scheduled?
>>>>
>>>
>>> I think it's the best we can do that immediately solves the issue unless 
>>> you have another idea in mind?
>>
>> "schedule_timeout_killable(1) outside of oom_lock" or "the OOM reaper grabs oom_lock
>> so that allocating threads guarantee that the OOM reaper gets scheduled" or "direct OOM
>> reaping so that allocating threads guarantee that some memory is reclaimed".
>>
> 
> The cond_resched() here is needed if the iteration is lengthy depending on 
> the number of descendant memcgs already.

No. cond_resched() here will become no-op if CONFIG_PREEMPTION=y and current
thread has realtime priority.

> 
> schedule_timeout_killable(1) does not make any guarantees that current 
> will be scheduled after the victim or oom_reaper on UP systems.

The point of schedule_timeout_*(1) is to guarantee that current thread
will yield CPU to other threads even if CONFIG_PREEMPTION=y and current
thread has realtime priority case. There is no guarantee that current
thread will be rescheduled immediately after a sleep is irrelevant.

> 
> If you have an alternate patch to try, we can test it.  But since this 
> cond_resched() is needed anyway, I'm not sure it will change the result.

schedule_timeout_killable(1) is an alternate patch to try; I don't think
that this cond_resched() is needed anyway.

> 
>>>
>>>>>  		switch (mem_cgroup_protected(target_memcg, memcg)) {
>>>>>  		case MEMCG_PROT_MIN:
>>>>>  			/*
>>>>>
>>>>
>>
>>