Hi, Apologies for the long CC list, it seems lots of people were either involved with the scanning code or could know how it should work :) After having two bug reports about the BSS list/probe_resp variable in mac80211 and a very detailed analysis of the problem from Bill Moss, I think I've understood the problem now but I'm not sure how to proceed because I do not understand the reason for "Do not allow beacons to override data from probe responses". I'll get into more detail of that code in a bit, but let me give a global overview first for those not intimately familiar with the code. In the MLME/scan code, mac80211 keeps a list of BSS info structures (for each hardware that is registered to the system), each of the structures containing information on the BSS, see struct ieee80211_sta_bss in ieee80211_i.h. When we scan, we will always try to keep each of the BSS structs we have updated with the last information received, but there is code to prevent updating it from a beacon when we have previously received probe responses. Now, this code has two mostly orthogonal problems. The first one is that there is no actual expiration of BSS structs. Each BSS struct has a 'last_update' member that contains (in jiffies) the time this item was last updated. This means that we accumulate BSS information forever, but due to the 'last_update' only the last few items will be returned to the user on asking for a scan result. This obviously has problems since a rogue station could bombard us with fake probe responses and cause us to build a huge BSS list which is never again freed until the hardware is deregistered. This will need to be fixed, of course. The second problem is the one Bill Moss and Vladimir Koutny were running into. As soon as we receive a probe response from a BSS, we never again update it from beacons, hence userspace can never find an AP during passive scanning if we've seen that BSSID before during an active scan. In Vladimir Koutny's case this is due to a hardware bug (though I can't see why any hardware would behave that way), in Bill Moss's case I'm not entirely sure why he's not getting probe responses, maybe the AP is just not quick enough. In any case, the problem then is that last_update is updated only along with the remaining information in a BSS struct, which is quite correct. The question, however, is: why are beacons not allowed to override probe response information? That is, why does this piece of code exist: if (sdata->vif.type != IEEE80211_IF_TYPE_IBSS && bss->probe_resp && beacon) { /* STA mode: * Do not allow beacon to override data from Probe Response. */ ieee80211_rx_bss_put(dev, bss); return; } I failed to find any explanation for it in the git archive (it was part of the first code drop from Jiri that made it into a wireless tree and I don't have any older git archives handy) nor in my email though I vaguely remember there was a reason for it (or I may have made it up). Bill Moss reports that removing this piece of code (and then the probe_resp variable completely since it'll be write-only) fixes the problem, and it's obvious why. The only thing I'm not sure about is whether that will introduce regressions because probe responses can contain better information than beacons. The only reason I can find for this would be an AP that has multiple SSIDs with different security settings on each and announces those only in the probe responses (and otherwise uses a hidden SSID), but we only handle that now, much later than the code there that was already present. However, that actually seems to have another bug, if you have such an AP will you find two BSSes, one with a hidden and one with a visible SSID? Any hints about this code or the desired behaviour would be greatly appreciated. Thanks, johannes
Attachment:
signature.asc
Description: This is a digitally signed message part