just a quick update, once again I blindly used this patches which seems to solve the problem, agg-rx.c still uses the call_rcu thing, so I thought work.c , aggr-tx.c can make use of call_rcu . thanks to the guy who mailed this patch! need to test more On Wed, May 18, 2011 at 8:16 PM, Mohammed Shafi <shafi.wireless@xxxxxxxxx> wrote: > On Wed, May 18, 2011 at 8:11 PM, Larry Finger <Larry.Finger@xxxxxxxxxxxx> wrote: >> On 05/18/2011 08:04 AM, Mohammed Shafi wrote: >>> >>> On Wed, May 18, 2011 at 6:21 PM, Gertjan van Wingerde >>> <gwingerde@xxxxxxxxx> wrote: >>>> >>>> On 05/18/11 14:41, Mohammed Shafi wrote: >>>>> >>>>> On Wed, May 18, 2011 at 5:26 PM, Walter Goldens >>>>> <goldenstranger@xxxxxxxxx> wrote: >>>>>>>> >>>>>>>> A very peculiar bug. >>>>>>>> >>>>>>>> With compat-wireless from 16.05 a nasty bug started to >>>>>>> >>>>>>> manifest itself. Right around association time, the >>>>>>> rt2800usb causes kernel panic. The system freezes and the >>>>>>> Caps Lock and Num Lock leds on the keyboard begin to flash. >>>>>>> >>>>>>> also ath9k, iwlagn. >>>>>>> >>>>>>>> >>>>>>>> Unfortunately there are no recoverable traces after >>>>>>> >>>>>>> the system failure to aid this bug report or to indicate its >>>>>>> origin. >>>>>>>> >>>>>>>> I believe it may somehow be related to Ubuntu's >>>>>>> >>>>>>> network-manager. If I turn off the network-manager service, >>>>>>> I can go into monitor mode for example, but if >>>>>>> network-manager is running and I plug my USB dongle, it >>>>>>> starts to associate, a second or two later the system is in >>>>>>> complete meltdown. >>>>>>> >>>>>>> same thing, monitor mode worked perfectly fine. >>>>>>> >>>>>>>> >>>>>>>> Nothing concrete, but a hunch is telling me this has >>>>>>> >>>>>>> something to do with the association mechanism of the >>>>>>> rt2800usb. Compat-wireless from few days back exhibits no >>>>>>> such foul play. >>>>>>> >>>>>>> yes just right at the association complete freeze. >>>>>>> >>>>>> >>>>>> That's strange. I wonder what's the connection with this bug and >>>>>> network-manager. Because when I manually tried to associate, dmesg reported >>>>>> the association attempt timed out. >>>>> >>>>> no even when we use iw dev connect command we can see the panic. >>>> >>>> Yeah, I've seen this freeze as well using one of the later >>>> compat-wireless packages using just iw and wpa_supplicant to bring up the >>>> card. This is on all sorts of rt2x00 supported devices. >>>> >>>> However, I don't believe this to be an rt2x00-specific bug, as exactly >>>> the same rt2x00 sources inside a compat-wireless-2.6.39rc7 package do not >>>> produce the freeze. >>>> >>>>> some expert suspected that there is a chance of kfree_rcu in >>>>> compat-wireless may have caused the problem >>>>> >>>> >>>> That's where my suspicion is as well, but I didn't have the time to >>>> further investigate. Since my focus was on rt2x00 I used the >>>> compat-wireless-2.6.39rc7 package to test my patches. I only did a quick >>>> check, and the kfree_rcu compatibility fix that was done in compat-wireless >>>> did seem to match the kfree_rcu code is present in linux-next, but maybe >>>> there is an odd side-effect. >>> >>> I could not exactly remember this panic came just after kfree_rcu >>> backported.. >> >> This problem also occurs with rtl8192se from compat-wireless. When it was >> reported to me, a photo of the console log was included (attached). The >> crash is a NULL pointer in rcu_do_batch.clone.19 (I think - the photo >> quality is minimal.). > > thanks!, same type of call trace which I had also obtained, hopefully there in > http://pastebin.com/CZrSZrme > http://pastebin.com/gwZJGDG4 > > > >> >> Larry >> >> >> >> > > > > -- > shafi > -- shafi
diff --git b/net/mac80211/agg-tx.c a/net/mac80211/agg-tx.c index 53defaf..63d852c 100644 --- b/net/mac80211/agg-tx.c +++ a/net/mac80211/agg-tx.c @@ -136,6 +136,14 @@ void ieee80211_send_bar(struct ieee80211_sub_if_data *sdata, u8 *ra, u16 tid, u1 ieee80211_tx_skb(sdata, skb); } +static void kfree_tid_tx(struct rcu_head *rcu_head) +{ + struct tid_ampdu_tx *tid_tx = + container_of(rcu_head, struct tid_ampdu_tx, rcu_head); + + kfree(tid_tx); +} + int ___ieee80211_stop_tx_ba_session(struct sta_info *sta, u16 tid, enum ieee80211_back_parties initiator, bool tx) @@ -155,7 +163,7 @@ int ___ieee80211_stop_tx_ba_session(struct sta_info *sta, u16 tid, /* not even started yet! */ rcu_assign_pointer(sta->ampdu_mlme.tid_tx[tid], NULL); spin_unlock_bh(&sta->lock); - kfree_rcu(tid_tx, rcu_head); + call_rcu(&tid_tx->rcu_head, kfree_tid_tx); return 0; } @@ -314,7 +322,7 @@ void ieee80211_tx_ba_session_handle_start(struct sta_info *sta, int tid) spin_unlock_bh(&sta->lock); ieee80211_wake_queue_agg(local, tid); - kfree_rcu(tid_tx, rcu_head); + call_rcu(&tid_tx->rcu_head, kfree_tid_tx); return; } @@ -693,7 +701,7 @@ void ieee80211_stop_tx_ba_cb(struct ieee80211_vif *vif, u8 *ra, u8 tid) ieee80211_agg_splice_finish(local, tid); - kfree_rcu(tid_tx, rcu_head); + call_rcu(&tid_tx->rcu_head, kfree_tid_tx); unlock_sta: spin_unlock_bh(&sta->lock); diff --git b/net/mac80211/work.c a/net/mac80211/work.c index d2e7f0e..a94b312 100644 --- b/net/mac80211/work.c +++ a/net/mac80211/work.c @@ -65,9 +65,17 @@ static void run_again(struct ieee80211_local *local, mod_timer(&local->work_timer, timeout); } +static void work_free_rcu(struct rcu_head *head) +{ + struct ieee80211_work *wk = + container_of(head, struct ieee80211_work, rcu_head); + + kfree(wk); +} + void free_work(struct ieee80211_work *wk) { - kfree_rcu(wk, rcu_head); + call_rcu(&wk->rcu_head, work_free_rcu); } static int ieee80211_compatible_rates(const u8 *supp_rates, int supp_rates_len,