On Thu, Oct 14, 2010 at 2:31 PM, Ben Greear <greearb@xxxxxxxxxxxxxxx> wrote: > On 10/14/2010 02:25 PM, Luis R. Rodriguez wrote: >> >> On Wed, Oct 13, 2010 at 10:48 AM, Ben Greear<greearb@xxxxxxxxxxxxxxx> >> Âwrote: >>> >>> On 10/13/2010 10:29 AM, Luis R. Rodriguez wrote: >>>> >>>> On Wed, Oct 13, 2010 at 10:12 AM, Ben Greear<greearb@xxxxxxxxxxxxxxx> >>>> Âwrote: >>>>> >>>>> On 10/12/2010 11:40 AM, Luis R. Rodriguez wrote: >>>>>> >>>>>> On Tue, Oct 12, 2010 at 11:35 AM, Ben Greear<greearb@xxxxxxxxxxxxxxx> >>>>>> Âwrote: >>>>>>> >>>>>>> On 10/11/2010 11:10 PM, Luis R. Rodriguez wrote: >>>>>>>> >>>>>>>> On Mon, Oct 11, 2010 at 8:27 PM, Ben Greear<greearb@xxxxxxxxxxxxxxx> >>>>>>>> Âwrote: >>>>>>> >>>>>>>>> Another thing I was thinking about: ÂMaybe the queue of skbs and >>>>>>>>> dma >>>>>>>>> addresses >>>>>>>>> in ath9k is getting corrupted by multiple VIFs trying to write at >>>>>>>>> once? >>>>>>>>> ÂMaybe >>>>>>>>> some locking is needed in the xmit path? >>>>>>>> >>>>>>>> That was my second hunch. My first shot was to use >>>>>>>> spin_lock_irqsave() >>>>>>>> over the the uses of the rxbuf list and that seemed to help but I >>>>>>>> still managed to get a poison eventually. My next item to check for >>>>>>>> is >>>>>>>> of the permissibility of creating too much pressure to the point we >>>>>>>> end up looping over the rxbuf list and race against mac80211 >>>>>>>> free'ing >>>>>>>> a buffer. Will test that tomorrow if nothing else comes up creeping >>>>>>>> my >>>>>>>> priority queue. >>>>>>> >>>>>>> This code looks weird to me. ÂOne of the paprd branches >>>>>>> deletes the skb, the other doesn't appear to. ÂNeither >>>>>>> null out bf->bf_mpdu, which would appear to leave a dangling >>>>>>> pointer in at least the dev_kfree_skb_any() branch. >>>>>>> >>>>>>> ath_tx_complete frees it's skb in all cases, so another >>>>>>> bf->bf_mpdu dangling pointer issue. >>>>>>> >>>>>>> Maybe at the least we should null out bf->bf_mpdu when >>>>>>> skb is consumed? >>>>>> >>>>>> You're reading my mind, that was what I was going to test today. Still >>>>>> doing e-mail sweep though. >>>>> >>>>> At least in the xmit path, it seems cards that have EDMA support do >>>>> things a bit different. ÂOut of curiosity, on the system(s), you >>>>> reproduce >>>>> this, are any of yours supporting EDMA? ÂMine appear to not support >>>>> EDMA. >>>> >>>> EDMA is used on>= AR9003 families by Atheros. And no, I am not >>>> testing with an EDMA card, I am testing with an AR9002 family card, >>>> the AR9280 card. I am going to disregard the TX stuff as the bug is an >>>> RX issue :) I was able to more easily reproduce by doing an skb_copy() >>>> and free'ing the buffer right afterwards on the ath_send_to_mac80211() >>>> thingy, So it does appear that the poison check just happens more >>>> often when we do an skb_copy(). One reason this is easy to reproduce >>>> with multiple STAs is mac80211 uses skb_copy() to process each >>>> received skb for each STA. >>>> >>>> In my tests so far, protecting the rxbuf list with spin_lock_irqsave() >>>> did not help, and the wmb(); didn't either, something else is going on >>>> here. It would be nice to hack slab to keep an entire trace of the >>>> place the buffer was last free'd at instead of just the caller that >>>> freed it. >>> >>> I instrumented slub a while back and got the backtrace. ÂIt >>> was always in the same place for my testing. >>> >>> Here's the slub patch if you are interested in using it yourself: >>> https://patchwork.kernel.org/patch/236921/ >> >> when compiling this patch I get: >> >> arch/x86/built-in.o: In function `store_stack': >> /home/mcgrof/wireless-testing/arch/x86/kernel/dumpstack.c:259: >> undefined reference to `store_trace' > > You are compiling on 32-bit system? ÂI see the method in > the patch, but probably only for 32-bit x86... Ah no I'm on 64-bit. Luis -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html