Search Linux Wireless

Re: [PATCH] mac80211: Fix kernel hang on ax200 firmware crash.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2020-06-10 at 13:40 -0700, greearb@xxxxxxxxxxxxxxx wrote:
> From: Ben Greear <greearb@xxxxxxxxxxxxxxx>
> 
> I backported out-of-tree ax200 driver from backport-iwlwifi to my
> 5.4 kernel so that I could run ax200 beside other radios (backports
> mac80211 otherwise is incompatible and other drivers will crash).
> 
> Always possible that upstream kernel doesn't suffer from exactly this
> case, but upstream ax200 is too unstable to even get this far, so...
> 
> The ax200 firmware crash often causes the kernel to deadlock due to the
> while (sta->sta_state == IEEE80211_STA_AUTHORIZED)
> loop in __sta_info_Destroy_part.  If sta_info_move_state does not
> make progress, then it will loop forever.  In my case, sta_info_move_state
> fails due to the sdata-in-driver check.

Interesting. I don't think I've seen this in our testing before.

> iwlwifi 0000:12:00.0: dma_pool_destroy iwlwifi:bc, 00000000d859bd4c busy

Ugh, yeah, as an aside - we still leak stuff there... need to dig into
that.

> Signed-off-by: Ben Greear <greearb@xxxxxxxxxxxxxxx>
> ---
>  net/mac80211/sta_info.c | 23 +++++++++++++++++++++--
>  1 file changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
> index e2a04fc..31a3856 100644
> --- a/net/mac80211/sta_info.c
> +++ b/net/mac80211/sta_info.c
> @@ -1092,6 +1092,7 @@ static void __sta_info_destroy_part2(struct sta_info *sta)
>  	struct ieee80211_sub_if_data *sdata = sta->sdata;
>  	struct station_info *sinfo;
>  	int ret;
> +	int count = 0;
>  
>  	/*
>  	 * NOTE: This assumes at least synchronize_net() was done
> @@ -1104,6 +1105,13 @@ static void __sta_info_destroy_part2(struct sta_info *sta)
>  	while (sta->sta_state == IEEE80211_STA_AUTHORIZED) {
>  		ret = sta_info_move_state(sta, IEEE80211_STA_ASSOC);
>  		WARN_ON_ONCE(ret);
> +		if (++count > 1000) {
> +			/* WTF, bail out so that at least we don't hang the system. */
> +			sdata_err(sdata, "Could not move state after 1000 tries, ret: %d  state: %d\n",
> +				  ret, sta->sta_state);
> +			WARN_ON_ONCE(1);
> +			break;
> +		}

I guess that should be

if (WARN_ON_ONCE()) ...


etc.

>  		int err = drv_sta_state(sta->local, sta->sdata, sta,
>  					sta->sta_state, new_state);
> -		if (err)
> -			return err;
> +		if (err == -EIO) {
> +			/* Sdata-not-in-driver, we are out of sync, but probably
> +			 * best to carry on instead of bailing here, at least maybe
> +			 * we can clean this up.
> +			 */

It _could_ be the driver itself returning -EIO, so why not check the
sdata-in-driver flag?


Anyway, that mostly looks good and would make mac80211 more robust, but
like I just said in the other patch I think you need to consider
mac80211 changes more from mac80211's POV, not from an arbitrary
driver's POV.

Really here that mostly applies to the commit log, which should probably
say something like

	mac80211: deadlock due to driver misbehaviour

or so, and then go on to explain what it does in *mac80211*, and show
the iwlwifi parts only as an *example*.

Thanks,
johannes




[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Wireless Personal Area Network]     [Linux Bluetooth]     [Wireless Regulations]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Hiking]     [MIPS Linux]     [ARM Linux]     [Linux RAID]

  Powered by Linux