Hi Eric, all, I've been debugging (much thanks to bpftrace) TCP stalls on wifi, in particular on iwlwifi. What happens, essentially, is that we transmit large aggregates (63 packets of 7.5k A-MSDU size each, for something on the order of 500kB per PPDU). Theoretically we can have ~240 A-MSDUs on our hardware queues, and the hardware aggregates them into up to 63 to send as a single PPDU. At HE rates (160 MHz, high rates) such a large PPDU takes less than 2ms to transmit. I'm seeing around 1400 Mbps TCP throughput (a bit more than 1800 UDP), but I'm expecting more. A bit more than 1800 for UDP is about the max I can expect on this AP (it only does 8k A-MSDU size), but I'd think TCP then shouldn't be so much less (and our Windows drivers gets >1600). What I see is that occasionally - and this doesn't happen all the time but probably enough to matter - we reclaim a few of those large aggregates and free the transmit SKBs, and then we try to pull from mac80211's TXQs but they're empty. At this point - we've just freed 400+k of data, I'd expect TCP to immediately push more, but it doesn't happen. I sometimes see another set of reclaims emptying the queue entirely (literally down to 0 packets on the queue) and it then takes another millisecond or two for TCP to start pushing packets again. Once that happens, I also observe that TCP stops pushing large TSO packets and goes down to sometimes less than a single A-MSDU (i.e. ~7.5k) in a TSO, perhaps even an MTU-sized frame - didn't check this, only the # of frames we make out of this. If you have any thoughts on this, I'd appreciate it. Something I've been wondering is if our TSO implementation causes issues, but apart from higher CPU usage I see no real difference if I turned it off. I thought so because it splits up the SKBs into those A- MSDU sized chunks using skb_gso_segment() and then splits them down into MTU-sized all packed together into an A-MSDU using the hardware engine. But that means we release a bunch of A-MSDU-sized SKBs back to the TCP stack when they transmitted. Another thought I had was our broken NAPI, but this is TX traffic so the only RX thing is sync, and I'm currently still using kernel 5.4 anyway. Thanks, johannes