Hi James, > First off, everything described here is using mac80211_hwsim. I have > not tested if any of this happens on physical hardware or not. > > Commit 01afc6fed seems to have changed the kernel behavior with regard > to lost beacons. So much so that it completely breaks all roaming tests > for IWD and (if kept this way) will require severe changes to the > existing roaming logic we have used for quite a long time. Plus > supporting older kernels AND this new behavior is going to be quite > annoying to deal with. > > Before, the kernel would only send a lost beacon QCM event when it > detected beacon loss. This allowed us to scan, find a suitable BSS to > roam to, and then roam. > > Now it also sends Del Station, Deauthenticate, and Disconnect all > immediately after a lost beacon, and the disconnect reason being > DISASSOC_DUE_TO_INACTIVITY (4). We handle these extra events as we > would at any other time, and fully disconnect which prevents us from > being able to roam quickly (as well as breaking tests). > > Looking at that commit nothing particular jumps out at me, but > obviously those added flags are causing something else to send these > extra events. > > Was this change actually intended to cause these extra events? And if > so, why was it changed? I don't think that was intentional. But really that was meant only to enable support for *powersave*. I suspect that the changes are actually caused by adding REPORTS_TX_ACK_STATUS, which is in fact necessary here. But I suspect that it could be that you're testing this in the wrong way? From your description, it almost seems like you turn off the AP interface, and roam after that? I'm not sure that's really realistic. If you wanted to test the "a few beacons were lost" behaviour, then you'd really have to lose a few beacons only (perhaps by adding something to wmediumd?), and not drop the AP off the air entirely. If the AP is in fact completely unreachable, then I'm pretty sure real hardware will behave just like hwsim here, albeit perhaps a bit slower, though not by much. And then you'd have the same issue there. The fact that hwsim behaved differently would likely have been just a timing thing - it didn't advertise REPORTS_TX_ACK_STATUS, so we'd wait a bit longer until deciding that the AP really was truly gone. If the ACK status is reported we just send a (few?) quick nullfunc(s) and decide that very quickly. But that's independent on hwsim or real hardware. johannes