Re: [PATCH] mm,oom: Don't call schedule_timeout_killable() with oom_lock held.

Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> · Wed, 23 May 2018 19:24:48 +0900




Michal Hocko wrote:
> > I don't understand why you are talking about PF_WQ_WORKER case.
> 
> Because that seems to be the reason to have it there as per your
> comment.

OK. Then, I will fold below change into my patch.

        if (did_some_progress) {
                no_progress_loops = 0;
 +              /*
-+               * This schedule_timeout_*() serves as a guaranteed sleep for
-+               * PF_WQ_WORKER threads when __zone_watermark_ok() == false.
++               * Try to give the OOM killer/reaper/victims some time for
++               * releasing memory.
 +               */
 +              if (!tsk_is_oom_victim(current))
 +                      schedule_timeout_uninterruptible(1);

But Roman, my patch conflicts with your "mm, oom: cgroup-aware OOM killer" patch
in linux-next. And it seems to me that your patch contains a bug which leads to
premature memory allocation failure explained below.

@@ -1029,6 +1050,7 @@ bool out_of_memory(struct oom_control *oc)
 {
        unsigned long freed = 0;
        enum oom_constraint constraint = CONSTRAINT_NONE;
+       bool delay = false; /* if set, delay next allocation attempt */

        if (oom_killer_disabled)
                return false;
@@ -1073,27 +1095,39 @@ bool out_of_memory(struct oom_control *oc)
            current->mm && !oom_unkillable_task(current, NULL, oc->nodemask) &&
            current->signal->oom_score_adj != OOM_SCORE_ADJ_MIN) {
                get_task_struct(current);
-               oc->chosen = current;
+               oc->chosen_task = current;
                oom_kill_process(oc, "Out of memory (oom_kill_allocating_task)");
                return true;
        }

+       if (mem_cgroup_select_oom_victim(oc)) {

/* mem_cgroup_select_oom_victim() returns true if select_victim_memcg() made
   oc->chosen_memcg != NULL.
   select_victim_memcg() makes oc->chosen_memcg = INFLIGHT_VICTIM if there is
   inflight memcg. But oc->chosen_task remains NULL because it did not call
   oom_evaluate_task(), didn't it? (And if it called oom_evaluate_task(),
   put_task_struct() is missing here.) */

+               if (oom_kill_memcg_victim(oc)) {

/* oom_kill_memcg_victim() returns true if oc->chosen_memcg == INFLIGHT_VICTIM. */

+                       delay = true;
+                       goto out;
+               }
+       }
+
        select_bad_process(oc);
        /* Found nothing?!?! Either we hang forever, or we panic. */
-       if (!oc->chosen && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) {
+       if (!oc->chosen_task && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) {
                dump_header(oc, NULL);
                panic("Out of memory and no killable processes...\n");
        }
-       if (oc->chosen && oc->chosen != (void *)-1UL) {
+       if (oc->chosen_task && oc->chosen_task != (void *)-1UL) {
                oom_kill_process(oc, !is_memcg_oom(oc) ? "Out of memory" :
                                 "Memory cgroup out of memory");
-               /*
-                * Give the killed process a good chance to exit before trying
-                * to allocate memory again.
-                */
-               schedule_timeout_killable(1);
+               delay = true;
        }
-       return !!oc->chosen;
+
+out:
+       /*
+        * Give the killed process a good chance to exit before trying
+        * to allocate memory again.
+        */
+       if (delay)
+               schedule_timeout_killable(1);
+

/* out_of_memory() returns false because oc->chosen_task remains NULL. */

+       return !!oc->chosen_task;
 }

Can we apply my patch prior to your "mm, oom: cgroup-aware OOM killer" patch
(which eliminates "delay" and "out:" from your patch) so that people can easily
backport my patch? Or, do you want to apply a fix (which eliminates "delay" and
"out:" from linux-next) prior to my patch?