On 8/30/19 5:25 PM, Qian Cai wrote: > On Fri, 2019-08-30 at 17:11 +0200, Eric Dumazet wrote: >> >> On 8/30/19 4:57 PM, Qian Cai wrote: >>> When running heavy memory pressure workloads, the system is throwing >>> endless warnings below due to the allocation could fail from >>> __build_skb(), and the volume of this call could be huge which may >>> generate a lot of serial console output and cosumes all CPUs as >>> warn_alloc() could be expensive by calling dump_stack() and then >>> show_mem(). >>> >>> Fix it by silencing the warning in this call site. Also, it seems >>> unnecessary to even print a warning at all if the allocation failed in >>> __build_skb(), as it may just retransmit the packet and retry. >>> Well, __GFP_NOWARN would save me from explaining this warning to users many times. OTOH usually it's an indication that min_free_kbytes should be raised to better cope with network traffic. >> >> Same patches are showing up there and there from time to time. >> >> Why is this particular spot interesting, against all others not adding >> __GFP_NOWARN ? This one is interesting that it's a GFP_ATOMIC allocation triggered by incoming packets, and has a fallback mechanism. I don't recall other so notoric ones. >> Are we going to have hundred of patches adding __GFP_NOWARN at various points, >> or should we get something generic to not flood the syslog in case of memory >> pressure ? >> > > From my testing which uses LTP oom* tests. There are only 3 places need to be > patched. The other two are in IOMMU code for both Intel and AMD. The place is > particular interesting because it could cause the system with floating serial > console output for days without making progress in OOM. I suppose it ends up in > a looping condition that warn_alloc() would end up generating more calls into > __build_skb() via ksoftirqd. Regardless of this particular allocation, if the reporting itself makes the conditions so much worse, then at least some kind of general ratelimit would make sense indeed.