On 9/21/21 9:55 PM, Andrew Morton wrote: > On Mon, 20 Sep 2021 13:59:35 +0300 Vasily Averin <vvs@xxxxxxxxxxxxx> wrote: > >> On 9/20/21 4:22 AM, Tetsuo Handa wrote: >>> On 2021/09/20 8:31, Andrew Morton wrote: >>>> On Fri, 17 Sep 2021 11:06:49 +0300 Vasily Averin <vvs@xxxxxxxxxxxxx> wrote: >>>> >>>>> Huge vmalloc allocation on heavy loaded node can lead to a global >>>>> memory shortage. A task called vmalloc can have the worst badness >>>>> and be chosen by OOM-killer, however received fatal signal and >>>>> oom victim mark does not interrupt allocation cycle. Vmalloc will >>>>> continue allocating pages over and over again, exacerbating the crisis >>>>> and consuming the memory freed up by another killed tasks. >>>>> >>>>> This patch allows OOM-killer to break vmalloc cycle, makes OOM more >>>>> effective and avoid host panic. >>>>> >>>>> Unfortunately it is not 100% safe. Previous attempt to break vmalloc >>>>> cycle was reverted by commit b8c8a338f75e ("Revert "vmalloc: back off when >>>>> the current task is killed"") due to some vmalloc callers did not handled >>>>> failures properly. Found issues was resolved, however, there may >>>>> be other similar places. >>>> >>>> Well that was lame of us. >>>> >>>> I believe that at least one of the kernel testbots can utilize fault >>>> injection. If we were to wire up vmalloc (as we have done with slab >>>> and pagealloc) then this will help to locate such buggy vmalloc callers. >> >> Andrew, could you please clarify how we can do it? >> Do you mean we can use exsiting allocation fault injection infrastructure to trigger >> such kind of issues? Unfortunately I found no ways to reach this goal. >> It allows to emulate single faults with small probability, however it is not enough, >> we need to completely disable all vmalloc allocations. > > I don't see why there's a problem? You're saying "there might still be > vmalloc() callers which don't correctly handle allocation failures", > yes? > > I'm suggesting that we use fault injection to cause a small proportion > of vmalloc() calls to artificially fail, so such buggy callers will > eventually be found and fixed. Why does such a scheme require that > *all* vmalloc() calls fail? Let me explain. 1) it is not trivial to use current allocation fault injection to cause a small proportion of vmalloc() calls to artificially fail. vmalloc __vmalloc_node __vmalloc_node_range __vmalloc_area_node vm_area_alloc_pages vm_area_alloc_pages uses new __alloc_pages_bulk subsystem, requesting up to 100 pages in cycle. __alloc_pages_bulk() can be interrupted by allocation fault injection, however in this case vm_area_alloc_pages() will failback to old-style page allocation cycle. In general case it successfully finishes allocation and vmalloc itself will not fail. To fail vmalloc we need to fail both alloc_pages_bulk_array_node() and alloc_pages_node() together. 2) if we failed single vmalloc it is not enough. I would remind, we want to emulate fatal signal reaction. However I afraid dying task can execute a quite complex rollback procedure. This rollback can call another vmalloc and last one will be failed again on fatal_signal_pending check. To emulate this behavior in fault injection we need to disable all following vmalloc calls of our victim, pseudo-"dying" task. I doubt both these goals can be reached by current allocation fault injection subsystem, I do not understand how to configure it accordingly. Thank you, Vasily Averin