On Tue, 2024-10-29 at 08:55 -0700, Ben Greear wrote: > > > > Not really? It should only get here from two places: userspace > > (serialized, so you're not going to get to this point with two threads > > from there), and the "queue no longer full" logic I mentioned above. Oh, > > maybe technically a third at the beginning after allocating a new queue. > > How is user-space serialized here? The comments in the code seem to assume > that multiple threads/whatever calling into this is expected (ie, the whole > 3-state atomic counter). Well mac80211, certainly serialized per iTXQ, but since we don't have LLTX (yet) also through a single netdev queue. > > I guess I could sort of see a scenario where > > > > - queues got full > > - queues got not full > > - we kick this logic via "queue not full" > > - while this is running, userspace TX permanently bumps > > tx_request from 1 to 2, this decrements it again, etc. > > Considering GSO and KASAN slowness and highly loaded system, perhaps under memory pressure too, > maybe upper stack could feed the txq fast enough that something is always bumping > tx_requests to 2 before inner loop can finish? Maybe? But at some point the socket buffers are full too. You can't indefinitely queue packets. > > What thread is the soft lockup in that you see? > > I believe this below is the culprit. Other threads are blocked on trying to grab xmit lock on > netdev and sock locks on tcp socket(s). That seems odd though - it's locked up at a different place, and also coming in from userspace - but that's serialized against other userspace, and the "queue not full" thread can only run once really? OTOH if it's deadlocked at that level you could get interrupted at any level below... I guess it's still possible that you end up pushing a packet down, it gets transmitted, space becomes available, and then we run this again to push more packets all while the original loop didn't finish? But that seem pretty unlikely? Hard to tell what's going on. johannes