Re: [PATCH 00/18] brcmsmac: Tx rework and expanded debug/trace support

Karl Beldan <karl.beldan@xxxxxxxxx> · Fri, 2 Nov 2012 11:49:35 +0100



Hi,

On Fri, Oct 26, 2012 at 09:23:15AM -0500, Seth Forshee wrote:
> I've been looking into the issues with brcmsmac performance reported at
> [0] and [1]. I started out looking into the tx queueing based on the "No
> where to go" messages in the logs. This code has a number of
> shortcomings:
> 
>  - The amount of bufferring is excessive. The tx queue will buffer up to
>    228 packets, and each of the tx DMA rings will queue up to 256 more.
> 
>  - There's no flow control. If the queue fills up packets begin to get
>    dropped, as evidenced by the "No where to go" messages.
> 
>  - Without flow control the tx queue probably helps avoid dropping
>    packets for short bursts due to the sheer number of packets that will
>    be buffered, but if flow control is added the only remaining benefit
>    that I can see is that it accumulates packets for aggregation. The tx
>    queue is far more complex than needed for supporting aggergation,
>    however.
> 
> As a result I worked up the following patches to add flow control remove
> the tx queue.
> 
> These patches change the tx handler to directly hand off packets to the
> DMA code. The convoluted priority->precedence->fifo mapping is converted
> to a simple one-to-one mapping of the mac80211 queues to fifos. Non-
> aggregate frames are immediately inserted into the DMA ring.
> 
> Handling of aggregate frames is not as simple, as some of the tx header
> fixups can only happen once we have all the frames for an AMPDU.  To
> support this without resyncing buffers after they've been added to the
> DMA ring I've added the concept of AMPDU sessions. An AMPDU session
> simply queues up the frames for a single AMPDU until we are ready to
> insert them into the tx ring. There is one session per DMA ring, and
> descriptors are reserved in the corresponding ring for all frames queued
> in the AMPDU session. This also has the benefit of allowing non-
> aggregate frames to be sent without affecting aggregation and without
> mapping these frames to a different fifo.
> 
> The patches also add flow control to stop incoming tx packets when the
> DMA ring is full. In practice I found that we will sometimes receive a
> single frame from mac80211 after stopping the queues, so some headroom
> is reserved when stopping the queues. I also reduced the number of tx
> descriptors per ring to 64 and fixed a bug that prevented having
> differing non-zero numbers of tx and rx descriptors for a given ring.
>
It is strange that you would get one frame after stopping the queues.
Apart from the iface up/down code which I did not look at, it seems the tx 
codepaths for queues stop/wake are all properly protected by spin_lock_bhs.
You mention a possible race in your code comments .. are you referring to 
mac80211 or the driver itself ?
 
> When workig on this I made extensive use of ftrace for debug and
> verification. I'm including patches I wrote which expand the trace
> support and introduce debug macros which can log messages both to dmesg
> and the trace buffer. iwlwifi has similar trace support which we've
> enabled in Ubuntu, making it easier to collect debug information from
> users experiencing wireless problems.
> 
> With these changes I'm no longer seeing dropped frames when the tx
> queues are full. Anecdotally I'd also say that my testing with iperf
> using TCP seems to show more consistent data rates, resulting in a
> higher average data rate (sometimes significantly so), but I don't have
> sufficient amounts of data to be sure this is the case.
> 
Stopping the queues instead of dropping the skb, the TCP tx throughput should
improve.

 
Karl
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html