Search Linux Wireless

Re: mac80211: 3.9.0+: Invalid WDS/flush state and non-connecting station.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 05/08/2013 11:14 AM, Ben Greear wrote:
On 05/08/2013 10:58 AM, Johannes Berg wrote:
On Wed, 2013-05-08 at 09:18 -0700, Ben Greear wrote:

Ok, I reproduced this with yet more debugging printouts in the kernel.

The symptom is this:

The sme_state is SME_CONNECTED, so it bails out below before sending the
'connected' message to user-space.

Is your system being really really really slow and/or are threads
getting pre-empted a lot? This maybe seem like a bit of a stretch, but
it seems possible that this happens:

ieee80211_sta_rx_queued_mgmt() is running, possibly on one CPU, and is
somewhere between printing "associated" and calling
cfg80211_send_rx_assoc() (or in the call already, before taking the lock
though.)

Then your interface is set down at the same time, possibly on a
different CPU. Here's where the scenario gets stretched, clearly your
interface is getting set down over a minute later, I don't see how you
could have stalled the other thread for that long.

But if you did, then that thread is still processing things while the
interface is going down, cfg80211 didn't know anything about the
association having completed so it won't have disconnected, etc.

So far, I haven't found any other scenario, nor a solution.

It is not that slow or overloaded (at least most of the time,
and in particular, I only had 20 virtual stations up on this system
not doing much traffic...it easily handles 100's of stations).

And, once it gets in this state..it stays there (overnight,
with my app resetting the port (via 'ip link set down' and
poking at wpa_supplicant) every minute or so in this case.

I was wondering..in the cfg80211_mlme_down method (or perhaps
some place similar), should we force sme state to IDLE
with a big WARN_ON_ONCE or similar.

That way, if it does get stuck somehow, we can recover by
downing the interface and bringing it back up?


Here's some more debug info..hit it again today:

I added this debug code (on top of all my other patches and 3.9.1+).

void cfg80211_mlme_down(struct cfg80211_registered_device *rdev,
			struct net_device *dev)
{
	struct wireless_dev *wdev = dev->ieee80211_ptr;
	struct cfg80211_deauth_request req;
	u8 bssid[ETH_ALEN];

	ASSERT_WDEV_LOCK(wdev);

	printk("mlme_down: %s: type: %i  sme_state: %i current-bss: %p\n",
               dev->name, (int)(wdev->iftype), (int)(wdev->sme_state),
	       wdev->current_bss);

I see this printout for the stuck station (this is dmesg | grep sta74,
so it skips errors about other interfaces that are also hung).

I am guessing we should never be calling mlme_down with state
of CFG80211_SME_CONNECTED when bss is NULL?

I'm hoping I can get by with some sort of work-around patch
for the 3.9 kernel instead of trying to patch in your big
locking changes....


sta74: authenticate with 00:de:ad:1d:ea:00
sta74: send auth to 00:de:ad:1d:ea:00 (try 1/3)
sta74: authenticated
sta74: associate with 00:de:ad:1d:ea:00 (try 1/3)
sta74: RX AssocResp from 00:de:ad:1d:ea:00 (capab=0x1 status=0 aid=67)
IPv6: ADDRCONF(NETDEV_CHANGE): sta74: link becomes ready
sta74: associated
connect_result: sta74: type: 2  sme_state: 2
__cfg80211_disconnect: sta74: type: 2  sme_state: 2  conn-state: -1
mlme_down: sta74: type: 2  sme_state: 2 current-bss:           (null)
mlme_down: sta74: type: 2  sme_state: 2 current-bss:           (null)
sta74: Invalid WDS/flush state, type: 2  WDS: 5  flushed: 1
IPv6: ADDRCONF(NETDEV_UP): sta74: link is not ready
__cfg80211_disconnect: sta74: type: 2  sme_state: 2  conn-state: -1
mlme_down: sta74: type: 2  sme_state: 2 current-bss:           (null)
mlme_down: sta74: type: 2  sme_state: 2 current-bss:           (null)
IPv6: ADDRCONF(NETDEV_UP): sta74: link is not ready
sta74: authenticate with 00:de:ad:1d:ea:00
sta74: send auth to 00:de:ad:1d:ea:00 (try 1/3)
sta74: authenticated
sta74: associate with 00:de:ad:1d:ea:00 (try 1/3)
sta74: RX AssocResp from 00:de:ad:1d:ea:00 (capab=0x1 status=0 aid=67)
sta74: associated
connect_result: sta74: type: 2  sme_state: 2
IPv6: ADDRCONF(NETDEV_CHANGE): sta74: link becomes ready


For what it's worth, I don't recall ever seeing this problem
in 5.7, but it's way to rare to be able to bisect...

Thanks,
Ben


johannes





--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc  http://www.candelatech.com

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Wireless Personal Area Network]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Hiking]     [MIPS Linux]     [ARM Linux]     [Linux RAID]

  Powered by Linux