> All of this makes sense since you need to hold RTNL for all those state > changes/notifier chains, but it does lead to the first race/consistency > problem: if you query at just the right time you can see carrier being > on, however, if the carrier is actually removed again and the linkwatch > work didn't run yet, there might never be an event for the carrier on, > iow, you might have: > > netif_carrier_on() > query from userspace and see carrier on > netif_carrier_off() > linkwatch work runs and sends only carrier off event > > This is at least a bit confusing, but not really my main problem here, > though it did in fact happen to me as well, in a fashion. That is interesting. Copper Ethernet PHYs might have the opposite problem. The status bit about link is latching low. This means that if the link is lost and then very quickly comes back again, you always get to see the lost and then restored link. So maybe we have: netif_carrier_off() query from userspace and see carrier off netif_carrier_oon() linkwatch work runs and sends only carrier on event ??? Maybe we want linkwatch to keep the old and the new state. If there is a change reported while there is still work queued, which flips a bit back to its old state, it needs to block until the work is actually done? > Possible solution #2: > --------------------- > Another - more feasible - option might be to say OK, so the associated > event (and a few friends) are the problem, so we can queue those events > in cfg80211, and only release them on NETDEV_CHANGE notifier call. > That's from netdev_state_change() after dev_activate() in linkwatch > work. We'd want to pair it with netif_carrier_on() so we actually expect > the event to come, and maybe give netif_carrier_on() a return value > indicating that it scheduled - or else check using carrier_up_count > perhaps? Probably not an issue with 802.11, but sometimes drivers do odd things with the carrier because of the BMC. The BMC can have a side channel into the hosts NIC, which allows it to share the hosts PHY and RJ45 socket. So that the BMC can send/receive frames while the host NIC is admin down, the carrier might actually be up. And requests to down the carrier are ignored. As i said, probably irrelevant to 802.11, but if you try to make a generic solution, you might need to watch out for this. Andrew