On Thu, 2019-09-05 at 10:32 +0200, Eric Dumazet wrote: > > On 9/4/19 10:42 PM, Qian Cai wrote: > > > To summary, those look to me are all good long-term improvement that would > > reduce the likelihood of this kind of livelock in general especially for > > other > > unknown allocations that happen while processing softirqs, but it is still > > up to > > the air if it fixes it 100% in all situations as printk() is going to take > > more > > time and could deal with console hardware that involve irq_exit() anyway. > > > > On the other hand, adding __GPF_NOWARN in the build_skb() allocation will > > fix > > this known NET_TX_SOFTIRQ case which is common when softirqd involved at > > least > > in short-term. It even have a benefit to reduce the overall warn_alloc() > > noise > > out there. > > > > I can resubmit with an update changelog. Does it make any sense? > > It does not make sense. > > We have thousands other GFP_ATOMIC allocations in the networking stacks. Instead of repeatedly make generalize statements, could you enlighten me with some concrete examples that have the similar properties which would trigger a livelock, - guaranteed GFP_ATOMIC allocations when processing softirq batches. - the allocation has a fallback mechanism that is unnecessary to warn a failure. I thought "skb" is a special-case here as every packet sent or received is handled using this data structure. > > Soon you will have to send more and more patches adding __GFP_NOWARN once > your workloads/tests can hit all these various points. I doubt so. > > It is really time to fix this problem generically, instead of having > to review hundreds of patches. > > This was my initial feedback really, nothing really has changed since. I feel like you may not follow the thread closely. There are more details uncovered in the last few days and narrowed down to the culprits. > > The ability to send a warning with a stack trace, holding the cpu > for many milliseconds should not be decided case by case, otherwise > every call points will decide to opt-out from the harmful warnings. That is not really the reasons anymore why I asked to add a __GPF_NOWARN here.