On Tue, 2019-09-03 at 20:53 +0200, Michal Hocko wrote: > On Tue 03-09-19 11:42:22, Qian Cai wrote: > > On Tue, 2019-09-03 at 15:22 +0200, Michal Hocko wrote: > > > On Fri 30-08-19 18:15:22, Eric Dumazet wrote: > > > > If there is a risk of flooding the syslog, we should fix this > > > > generically > > > > in mm layer, not adding hundred of __GFP_NOWARN all over the places. > > > > > > We do already ratelimit in warn_alloc. If it isn't sufficient then we > > > can think of a different parameters. Or maybe it is the ratelimiting > > > which doesn't work here. Hard to tell and something to explore. > > > > The time-based ratelimit won't work for skb_build() as when a system under > > memory pressure, and the CPU is fast and IO is so slow, it could take a long > > time to swap and trigger OOM. > > I really do not understand what does OOM and swapping have to do with > the ratelimiting here. The sole purpose of the ratelimit is to reduce > the amount of warnings to be printed. Slow IO might have an effect on > when the OOM killer is invoked but atomic allocations are not directly > dependent on IO. When there is a heavy memory pressure, the system is trying hard to reclaim memory to fill up the watermark. However, the IO is slow to page out, but the memory pressure keep draining atomic reservoir, and some of those skb_build() will fail eventually. Only if there is a fast IO, it will finish swapping sooner and then invoke the OOM to end the memory pressure. > > > I suppose what happens is those skb_build() allocations are from softirq, > > and > > once one of them failed, it calls printk() which generates more interrupts. > > Hence, the infinite loop. > > Please elaborate more. > If you look at the original report, the failed allocation dump_stack() is, <IRQ> warn_alloc.cold.43+0x8a/0x148 __alloc_pages_nodemask+0x1a5c/0x1bb0 alloc_pages_current+0x9c/0x110 allocate_slab+0x34a/0x11f0 new_slab+0x46/0x70 ___slab_alloc+0x604/0x950 __slab_alloc+0x12/0x20 kmem_cache_alloc+0x32a/0x400 __build_skb+0x23/0x60 build_skb+0x1a/0xb0 igb_clean_rx_irq+0xafc/0x1010 [igb] igb_poll+0x4bb/0xe30 [igb] net_rx_action+0x244/0x7a0 __do_softirq+0x1a0/0x60a irq_exit+0xb5/0xd0 do_IRQ+0x81/0x170 common_interrupt+0xf/0xf </IRQ> Since it has no __GFP_NOWARN to begin with, it will call, printk vprintk_default vprintk_emit wake_up_klogd irq_work_queue __irq_work_queue_local arch_irq_work_raise apic->send_IPI_self(IRQ_WORK_VECTOR) smp_irq_work_interrupt exiting_irq irq_exit and end up processing pending net_rx_action softirqs again which are plenty due to connected via ssh etc.