Re: [PATCH] mm, oom: Tolerate processes sharing mm with different view of oom_score_adj.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2019/01/16 20:09, Michal Hocko wrote:
> On Wed 16-01-19 19:55:21, Tetsuo Handa wrote:
>> This patch reverts both commit 44a70adec910d692 ("mm, oom_adj: make sure
>> processes sharing mm have same view of oom_score_adj") and commit
>> 97fd49c2355ffded ("mm, oom: kill all tasks sharing the mm") in order to
>> close a race and reduce the latency at __set_oom_adj(), and reduces the
>> warning at __oom_kill_process() in order to minimize the latency.
>>
>> Commit 36324a990cf578b5 ("oom: clear TIF_MEMDIE after oom_reaper managed
>> to unmap the address space") introduced the worst case mentioned in
>> 44a70adec910d692. But since the OOM killer skips mm with MMF_OOM_SKIP set,
>> only administrators can trigger the worst case.
>>
>> Since 44a70adec910d692 did not take latency into account, we can hold RCU
>> for minutes and trigger RCU stall warnings by calling printk() on many
>> thousands of thread groups. Even without calling printk(), the latency is
>> mentioned by Yong-Taek Lee [1]. And I noticed that 44a70adec910d692 is
>> racy, and trying to fix the race will require a global lock which is too
>> costly for rare events.
>>
>> If the worst case in 44a70adec910d692 happens, it is an administrator's
>> request. Therefore, tolerate the worst case and speed up __set_oom_adj().
> 
> I really do not think we care about latency. I consider the overal API
> sanity much more important. Besides that the original report you are
> referring to was never exaplained/shown to represent real world usecase.
> oom_score_adj is not really a an interface to be tweaked in hot paths.

I do care about the latency. Holding RCU for more than 2 minutes is insane.

----------
#define _GNU_SOURCE
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <sched.h>
#include <sys/mman.h>
#include <signal.h>

#define STACKSIZE 8192
static int child(void *unused)
{
        pause();
        return 0;
}
int main(int argc, char *argv[])
{
        int fd = open("/proc/self/oom_score_adj", O_WRONLY);
        int i;
        char *stack = mmap(NULL, STACKSIZE, PROT_WRITE | PROT_READ, MAP_ANONYMOUS | MAP_PRIVATE, EOF, 0);
        for (i = 0; i < 8192 * 4; i++)
                if (clone(child, stack + STACKSIZE, CLONE_VM, NULL) == -1)
                        break;
        write(fd, "0\n", 2);
        kill(0, SIGSEGV);
        return 0;
}
----------

> 
> I can be convinced otherwise but that really requires some _real_
> usecase with an explanation why there is no other way. Until then
> 
> Nacked-by: Michal Hocko <mhocko@xxxxxxxx>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux