On Mon, Dec 5, 2016 at 10:57 PM, Johannes Berg <johannes@xxxxxxxxxxxxxxxx> wrote: >> > The premise seems fairly reasonable, although I'm a little worried >> > that if so much new traffic is coming in we never finish the scan >> > suspend? Actually, the queues are still stopped, so it's only >> > management frames that can come in, so that should be ok? >> > >> >> Actually it will finish scan eventually and back to SCAN_DECISION >> state but almost 20~30 seconds elapsed. The local->scanning should be >> cleared after all channels been scanned. However, from the debug >> messages I added in ieee80211_iface_work(), it still returns when >> check local->scanning and the DELBA still has no chance to be >> transferred then stuck again at the next scan state machine. Supposed >> to be another scan request issued but I don't know who's the killer. >> Except to find who issue the next scan request, BA session have no >> chance to reset in this long scan period (>20s) still need to be >> taken >> care. > > No no, you misunderstood. My question is more like: > > Where is this traffic coming from, since netdev queues should be > stopped? > > And then, if there's so much traffic coming in that we can take 20-30 > seconds to send it out, could we - with the change - get stuck forever? > My test scenario is just simply ping (not greedy ping). When it stalls, you just see 1 ping packet has been retried on the air for 3 times w/o ack from AP. Then no more tx QoS data packet observed on the air. From the log, it shows the tx ba session been expired and a DELBA is queued but never sent out. So the STA wants to terminate current BA session but can't make it. The air capture pcap file shows It can't deal with the coming "Action" frame for "ADDBA" also. The Block Ack state machine just stops until the whole sw scan done. In my case, it's >300 seconds. So this patch just tries to seek a chance to send out the DELBA and process the BA related packets in ieee80211_iface_work(). It tries to avoid the stuck forever thing. I tried 30000+ pings and verified no stuck observed. >> You're right. I just want to clear_bit and set_bit in this case, >> sorry for that confusing. Or any better suggestion? > > We seem to be using set_bit/clear_bit so that seems reasonable, unless > you can prove that all of those are under the lock and we can remove > the atomics entirely ... Not that it matters hugely, we don't scan all > the time after all! > I agree. Will modify this patch and then send again for approval. > johannes