2011/11/26 Adrian Chadd <adrian@xxxxxxxxxxx>: > Hi, > > Review #2: > > One thing that I discovered whilst debugging some silly races in > FreeBSD was that the update of the txq active mask wasn't done > atomically. > > Ie, when you check this: > > +++ b/drivers/net/wireless/ath/ath5k/base.c > @@ -1689,7 +1689,7 @@ ath5k_tasklet_tx(unsigned long data) > struct ath5k_hw *ah = (void *)data; > > for (i = 0; i < AR5K_NUM_TX_QUEUES; i++) > - if (ah->txqs[i].setup && (ah->ah_txq_isr & BIT(i))) > + if (ah->txqs[i].setup && (ah->ah_txq_isr_txok_all & BIT(i))) > ath5k_tx_processq(ah, &ah->txqs[i]); > > The way that the original code did it was: > > * update the HAL TX mask bits inside the HAL getisr routine, so it > would just update the HAL idea of txqactive; > * Then, ath_hal_gettxintrtxqs() would pass back to the driver the set > of txq bits currently active, and reset those that are active (with a > mask, so you can read only a few) > * the ath_intr() routine would simply schedule a tasklet call when a > TX interrupt came in; > * .. and then the TX completion tasklet would call > ath_hal_gettxintrtxqs() for each task bit to see if it needed > servicing. I'm not aware of ath_hal_gettxintrqs etc, this was the patch I wrote on this... http://www.mail-archive.com/ath5k-devel@xxxxxxxxxxxxxxx/msg01312.html > Now, in previous generations of hardware/kernel, this may not have > raced - ath_intr() may have been a "fast" interrupt handler and thus > would occur atomically on a single CPU - so you don't have any races. > But these days, ath_intr() is just a software interrupt and can occur > in parallel with another CPU executing the completion handler. > > So since there was no locking around the TXQ bit updates and clear, it > was very possible that a very busy TX device would "miss" TX queue > notifications, as the CPU running the TX completion would > read-and-clear the TXQ active bits in the HAL, whilst another CPU was > running ath_intr() which was trying to do a read/modify/write of those > same bits. > > I _think_ felix fixed this in ath9k by hiding this stuff behind the > PCU lock. I've also fixed it in FreeBSD. I think it's worth > re-evaluating what is going on in ath5k and maybe add some similar > locking. > > Oh wait. Where are you clearing these bits? It doesn't look like > you're even clearing the ah_txq_isr bits in the most recent ath5k tree > I have here. This means you're likely not missing events; but you're > likely always scanning those queues. Making this code a bit pointless. > :) Yup we don't clean them but that's not a bug, we still only check active queues that produced interrupts, not everything. Anyway we have txq->lock already so it's not a big deal to implement this I'll post a patch on top of this patchset. > Anyway, I'd tidy this all up in a subsequent patchset. Get the > interrupt changes in, then figure out how to properly lock the > interrupt TXQ bit set path (the isr) and the clear path (once you've > tested whether the TXQ needed servicing and then cleared the bit.) > > HTH, > > > adrian > Thanks again for your comments ;-) -- GPG ID: 0xEE878588 As you read this post global entropy rises. Have Fun ;-) Nick -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html