On 2018/05/09 21:00, Petr Mladek wrote: >>>> But we first need a real reason. Right now it looks to me like >>>> we have "a solution" to a problem which we have never witnessed. >>> >>> I am trying to find a "simple" and generic solution for the problem >>> reported by Tejun: >> [..] >>> 1. Console is IPMI emulated serial console. Super slow. Also >>> netconsole is in use. >>> 2. System runs out of memory, OOM triggers. >>> 3. OOM handler is printing out OOM debug info. >>> 4. While trying to emit the messages for netconsole, the network stack >>> / driver tries to allocate memory and then fail, which in turn >>> triggers allocation failure or other warning messages. printk was >>> already flushing, so the messages are queued on the ring. >>> 5. OOM handler keeps flushing but 4 repeats and the queue is never >>> shrinking. Because OOM handler is trapped in printk flushing, it >>> never manages to free memory and no one else can enter OOM path >>> either, so the system is trapped in this state. >>> </paste> > > IMHO, we do not need to chase down this particular problem. It was > already "solved" by the commit 400e22499dd92613821 ("mm: don't warn > about allocations which stall for too long"). Does memory allocation by network stack / driver while trying to emit the messages include __GFP_DIRECT_RECLAIM flag (e.g. GFP_KERNEL) ? Commit 400e22499dd92613821 handles only memory allocations with __GFP_DIRECT_RECLAIM flag. If memory allocation when trying to emit the messages does not include __GFP_DIRECT_RECLAIM flag (e.g. GFP_ATOMIC / GFP_NOWAIT), doesn't this particular problem still exist?