Re: [PATCH] mm/oom: added option 'oom_dump_task_cmdline'

Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> · Wed, 20 Feb 2019 19:00:19 +0900

On 2019/02/20 17:37, Bujnak, Stepan wrote:
>>> @@ -404,9 +406,18 @@ static void dump_tasks(struct mem_cgroup *memcg, const nodemask_t *nodemask)
>>>       pr_info("[  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name\n");
>>>       rcu_read_lock();
>>>       for_each_process(p) {
>>> +             char *name, *cmd = NULL;
>>> +
>>>               if (oom_unkillable_task(p, memcg, nodemask))
>>>                       continue;
>>>
>>> +             /*
>>> +              * This needs to be done before calling find_lock_task_mm()
>>> +              * since both grab a task lock which would result in deadlock.
>>> +              */
>>> +             if (sysctl_oom_dump_task_cmdline)
>>> +                     cmd = kstrdup_quotable_cmdline(p, GFP_KERNEL);
>>> +
>>>               task = find_lock_task_mm(p);
>>>               if (!task) {
>>>                       /*
>> You are trying to allocate from the OOM context. That is a big no no.
>> Not to mention that this is deadlock prone because get_cmdline needs
>> mmap_sem and the allocating context migh hold the lock already. So the
>> patch is simply wrong.
>>
> 
> Thanks for the notes. I understand how allocating from OOM context
> is a problem. However I still believe that this would be helpful
> for debugging OOM kills since task->comm is often not descriptive
> enough. Would it help if instead of calling kstrdup_quotable_cmdline()
> which allocates the buffer on heap I called get_cmdline() directly
> passing it stack-allocated buffer of certain size e.g. 256?

You made triple errors. First is that doing GFP_KERNEL allocation inside
rcu_read_lock()/rcu_read_unlock() is not permitted. Second is that doing
GFP_KERNEL allocation with oom_lock held is not permitted. Third is that
somebody might be already holding p->mm->mmap_sem for write when
get_cmdline() tries to hold it for read. That is, your patch can't work
(even if you update your patch to use static buffer).