On Wed, 2020-06-10 at 13:40 -0700, greearb@xxxxxxxxxxxxxxx wrote: > From: Ben Greear <greearb@xxxxxxxxxxxxxxx> > > I backported out-of-tree ax200 driver from backport-iwlwifi to my > 5.4 kernel so that I could run ax200 beside other radios (backports > mac80211 otherwise is incompatible and other drivers will crash). > > Always possible that upstream kernel doesn't suffer from exactly this > case, but upstream ax200 is too unstable to even get this far, so... > > The ax200 firmware crash often causes the kernel to deadlock due to the > while (sta->sta_state == IEEE80211_STA_AUTHORIZED) > loop in __sta_info_Destroy_part. If sta_info_move_state does not > make progress, then it will loop forever. In my case, sta_info_move_state > fails due to the sdata-in-driver check. Interesting. I don't think I've seen this in our testing before. > iwlwifi 0000:12:00.0: dma_pool_destroy iwlwifi:bc, 00000000d859bd4c busy Ugh, yeah, as an aside - we still leak stuff there... need to dig into that. > Signed-off-by: Ben Greear <greearb@xxxxxxxxxxxxxxx> > --- > net/mac80211/sta_info.c | 23 +++++++++++++++++++++-- > 1 file changed, 21 insertions(+), 2 deletions(-) > > diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c > index e2a04fc..31a3856 100644 > --- a/net/mac80211/sta_info.c > +++ b/net/mac80211/sta_info.c > @@ -1092,6 +1092,7 @@ static void __sta_info_destroy_part2(struct sta_info *sta) > struct ieee80211_sub_if_data *sdata = sta->sdata; > struct station_info *sinfo; > int ret; > + int count = 0; > > /* > * NOTE: This assumes at least synchronize_net() was done > @@ -1104,6 +1105,13 @@ static void __sta_info_destroy_part2(struct sta_info *sta) > while (sta->sta_state == IEEE80211_STA_AUTHORIZED) { > ret = sta_info_move_state(sta, IEEE80211_STA_ASSOC); > WARN_ON_ONCE(ret); > + if (++count > 1000) { > + /* WTF, bail out so that at least we don't hang the system. */ > + sdata_err(sdata, "Could not move state after 1000 tries, ret: %d state: %d\n", > + ret, sta->sta_state); > + WARN_ON_ONCE(1); > + break; > + } I guess that should be if (WARN_ON_ONCE()) ... etc. > int err = drv_sta_state(sta->local, sta->sdata, sta, > sta->sta_state, new_state); > - if (err) > - return err; > + if (err == -EIO) { > + /* Sdata-not-in-driver, we are out of sync, but probably > + * best to carry on instead of bailing here, at least maybe > + * we can clean this up. > + */ It _could_ be the driver itself returning -EIO, so why not check the sdata-in-driver flag? Anyway, that mostly looks good and would make mac80211 more robust, but like I just said in the other patch I think you need to consider mac80211 changes more from mac80211's POV, not from an arbitrary driver's POV. Really here that mostly applies to the commit log, which should probably say something like mac80211: deadlock due to driver misbehaviour or so, and then go on to explain what it does in *mac80211*, and show the iwlwifi parts only as an *example*. Thanks, johannes