On Sun, 2009-07-26 at 17:52 -0500, Larry Finger wrote: > While stress testing the newest version of the open-source firmware > for BCM43XX devices with the latest pull of wireless-testing, I ran > into a problem of DMA TX queue overrun. Initially I thought this was > due to the firmware change; however, I got the same error with the > standard firmware. I have not seen this before, but it may not be a > regression as it seems to occur only under special circumstances. I've also seen it under extreme stress on Intel hardware, cf. http://thread.gmane.org/gmane.linux.kernel.wireless.general/36497 > The critical code is in b43_dma_tx(), which is called by the .tx > callback routine registered with mac80211. > > After the fragment is transmitted by a call to dma_tx_fragment() at > line 1353, the routine checks to see if there are sufficient free > slots (2) to transmit another fragment using the code below: > > if ((free_slots(ring) < TX_SLOTS_PER_FRAME) || > should_inject_overflow(ring)) { > /* This TX ring is full. */ > ieee80211_stop_queue(dev->wl->hw, > skb_get_queue_mapping(skb)); > ring->stopped = 1; > if (b43_debug(dev, B43_DBG_DMAVERBOSE)) { > b43dbg(dev->wl, "Stopped TX ring %d\n", > ring->index); > } > } > > > The problem shows up at line 1340 for the next fragment: > > B43_WARN_ON(ring->stopped); > > if (unlikely(free_slots(ring) < TX_SLOTS_PER_FRAME)) { > b43warn(dev->wl, "DMA queue overflow\n"); > err = -ENOSPC; > goto out_unlock; > } > > The system generates the warning for ring->stopped and prints the "DMA > queue overflow" message. Right. Exactly the same behaviour as I'm seeing on Intel hardware. > My understanding is that mac80211 serializes the calls for each TX > queue, and that the TX callback should not have been entered for this > case. > > If I am not understanding the way that mac80211 works, please correct > me. I would also appreciate any suggestions for further debugging. I stared at the mac80211 code for a long time and concluded that it was a race condition and couldn't really be fixed, see my analysis in the iwlwifi patch. I'd love to be proved wrong though. Are you seeing this multiple times? I don't think you have fragmentation on, do you? At least I didn't and still saw the problem, which seemed a bit strange, but I really couldn't see any other way for it to happen. johannes
Attachment:
signature.asc
Description: This is a digitally signed message part