On Tue, Nov 29, 2016 at 8:59 PM, Johannes Berg <johannes@xxxxxxxxxxxxxxxx> wrote: > On Wed, 2016-11-23 at 15:59 +0800, Chris Chiu wrote: >> ieee80211_iface_work() will check if sw scanning is in progress >> before handling block ack session. In our case, the RTL8821AE >> operate in station mode, when tx session expired, DELBA packet >> stuck during sw scanning and so do other data packets. >> >> ieee80211_scan_state_decision() will take lots of time in >> SCAN_SUSPEND/SCAN_RESUME state due to !tx_empty or bad_latency. >> Then the sw scanning mostly take > 20 seconds to finish or even >> worse in our case RTL8821AE have 37 channels for 2G+5G to scan >> and tx stalls ~300 seconds. AP side still thinks the connection >> is alive because it still receives the QoS NULL packet from STA. >> So the link state will never change but actually no data tx/rx >> during this long time. >> >> This commit tries to send out packet in SCAN_SUSPEND state so the >> sw scanning can complete more efficiently and less affect on Block >> Ack session handling. Verified on RTL8821AE for > 30000 pings and >> no Tx BA session stuck observed. > > The premise seems fairly reasonable, although I'm a little worried that > if so much new traffic is coming in we never finish the scan suspend? > Actually, the queues are still stopped, so it's only management frames > that can come in, so that should be ok? > Actually it will finish scan eventually and back to SCAN_DECISION state but almost 20~30 seconds elapsed. The local->scanning should be cleared after all channels been scanned. However, from the debug messages I added in ieee80211_iface_work(), it still returns when check local->scanning and the DELBA still has no chance to be transferred then stuck again at the next scan state machine. Supposed to be another scan request issued but I don't know who's the killer. Except to find who issue the next scan request, BA session have no chance to reset in this long scan period (>20s) still need to be taken care. >> + test_and_clear_bit(SCAN_SUSPEND_SCANNING, &local->scanning); >> > > That makes no sense, you're not checking the return value, just clear > the bit without test. > >> @@ -844,6 +846,8 @@ static void ieee80211_scan_state_suspend(struct >> ieee80211_local *local, >> /* disable PS */ >> ieee80211_offchannel_return(local); >> >> + __set_bit(SCAN_SUSPEND_SCANNING, &local->scanning); > > Why are you using the non-atomic version here, vs. the atomic one > above? > You're right. I just want to clear_bit and set_bit in this case, sorry for that confusing. Or any better suggestion? > johannes