On Fri, Nov 23, 2018 at 08:45:54PM +0100, Johannes Berg wrote: > On Tue, 2018-11-20 at 15:20 -0600, Daniel Santos wrote: > > > I believe I have the answer as to why we're getting frames after we stop > > the queue. I had a little chat about this in #kernelnewbies and some > > other devs believe it is intentional. > > > > There is a race in ieee80211_tx_frags (net/mac80211/tx.c) between > > releasing local->queue_stop_reason_lock and calling the driver's tx > > until we lock queue->tx_lock in rt2x00queue_write_tx_frame -- in between > > these two points the thread can be preempted. So while we stop the > > queue in one thread, there can be 20 other threads in between that have > > already checked that the queue was running and have a frame buffer > > sitting on their stacks. > > Not 20, only 1 per netdev queue. I suspect that means just 1 per > hardware queue, but I don't know how rt2x00 maps netdev queues to > hardware queues. If you have lots of netdevs, that might actually be 20, > but I suspect that's not what's going on here. > > > I think it could be fixed with the below > > patch, but I'm being told that it doesn't need it and that the driver > > should just *quietly* drop the frame: > > [snip patch] > > > Could anybody illuminate for me why this isn't done? I know that there > > has to be a point where we realize there are too many items in the queue > > and we can't keep up, but this would seem to be a terrible way to do > > that. Is it one of those things where it isn't worth the performance > > degradation? Any insights would be most appreciated!! :) > > There's just not much point, and doing the locking here will mean it's > across _all_ queues, whereas if you have multiple hardware queues you > wouldn't really need it. > > Note that with internal TXQs with fq/codel support etc. we actually have > the fq->lock and it's global, and it turns out that's still a better > deal than actually doing parallel TX. So this may not matter so much. > > In any case, while this might solve the problem for the specific case > you're thinking about, as soon as you have something else starting or > stopping queues from outside the TX path it still wouldn't actually > help. > > By the way, one thing that can likely exacerbate the problem is > fragmentation, once you have that you're more likely to trigger lots of > frames in close succession. > > What I would suggest doing in the driver is actually stop the queues > once a *threshold* is reached, rather than being full. Say if you have > 128 entries in the HW queue, you can stop once you reach 120 (you > probably don't really need the other 8 places). If you get a 121st > frame, then you can still put it on the queue (until you filled 128), > but you can also stop the queue again (stopping queues is idempotent). That what rt2x00 does, but we have 64 entries on tx queues and because of that threshold is small. In: https://bugzilla.kernel.org/show_bug.cgi?id=82751 I proposed increase of queue size to 256 and hence make threshold bigger. However I was concerned about bufferbloat and requested testing from users how this affect latency. Never get testing results :-( Thanks Stanislaw