On 19.10.2021 17:13, Michal Hocko wrote: > On Tue 19-10-21 16:26:50, Vasily Averin wrote: >> On 19.10.2021 15:04, Michal Hocko wrote: >>> On Tue 19-10-21 13:54:42, Michal Hocko wrote: >>>> On Tue 19-10-21 13:30:06, Vasily Averin wrote: >>>>> On 19.10.2021 11:49, Michal Hocko wrote: >>>>>> On Tue 19-10-21 09:30:18, Vasily Averin wrote: >>>>>> [...] >>>>>>> With my patch ("memcg: prohibit unconditional exceeding the limit of dying tasks") try_charge_memcg() can fail: >>>>>>> a) due to fatal signal >>>>>>> b) when mem_cgroup_oom -> mem_cgroup_out_of_memory -> out_of_memory() returns false (when select_bad_process() found nothing) >>>>>>> >>>>>>> To handle a) we can follow to your suggestion and skip excution of out_of_memory() in pagefault_out_of memory() >>>>>>> To handle b) we can go to retry: if mem_cgroup_oom() return OOM_FAILED. >>>>> >>>>>> How is b) possible without current being killed? Do we allow remote >>>>>> charging? >>>>> >>>>> out_of_memory for memcg_oom >>>>> select_bad_process >>>>> mem_cgroup_scan_tasks >>>>> oom_evaluate_task >>>>> oom_badness >>>>> >>>>> /* >>>>> * Do not even consider tasks which are explicitly marked oom >>>>> * unkillable or have been already oom reaped or the are in >>>>> * the middle of vfork >>>>> */ >>>>> adj = (long)p->signal->oom_score_adj; >>>>> if (adj == OOM_SCORE_ADJ_MIN || >>>>> test_bit(MMF_OOM_SKIP, &p->mm->flags) || >>>>> in_vfork(p)) { >>>>> task_unlock(p); >>>>> return LONG_MIN; >>>>> } >>>>> >>>>> This time we handle userspace page fault, so we cannot be kenrel thread, >>>>> and cannot be in_vfork(). >>>>> However task can be marked as oom unkillable, >>>>> i.e. have p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN >>>> >>>> You are right. I am not sure there is a way out of this though. The task >>>> can only retry for ever in this case. There is nothing actionable here. >>>> We cannot kill the task and there is no other way to release the memory. >>> >>> Btw. don't we force the charge in that case? >> >> We should force charge for allocation from inside page fault handler, >> to prevent endless cycle in retried page faults. >> However we should not do it for allocations from task context, >> to prevent memcg-limited vmalloc-eaters from to consume all host memory. > > I don't see a big difference between those two. Because the #PF could > result into the very same situation depleting all the memory by > overcharging. A different behavior just leads to a confusion and > unexpected behavior. E.g. in the past we only triggered memcg OOM killer > from the #PF path and failed the charge otherwise. That is something > different but it shows problems we haven't anticipated and had user > visible problems. See 29ef680ae7c2 ("memcg, oom: move out_of_memory back > to the charge path"). In this case I think we should fail this allocation. It's better do not allow overcharge, neither in #PF not in regular allocations. However this failure will trigger false global OOM in pagefault_out_of_memory(), and we need to find some way to prevent it. Thank you, Vasily Averin