Search Linux Wireless

Re: [PATCH] ath10k: Restart xmit queues below low-water mark.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have been using Toke’s patch successfully with my Netgear R7800 (QCA9884) running OpenWRT snapshot build with ath10k-ct firmware and ath10k-ct-smallbuffers driver. I am using my R7800 as an AP only, so I am running a very stripped down build of OpenWRT.

I provided several Flent tests to Dave Taht and Toke showing some pretty significant improvements in latency. I did not capture any CPU/sirq stats on my R7800 during these tests, though. It does appear my maximum download bandwidth has decreased somewhat, but I am more than happy to take that hit for the benefit of significantly lower latency.

> On May 1, 2020, at 11:50 AM, John Deere <24601deerej@xxxxxxxxx> wrote:
> 
> 
> 
> On 5/1/20 10:16 AM, Ben Greear wrote:
>> On 04/30/2020 04:31 PM, John Deere wrote:
>>> I've just tried Toke's DQL on an Archer C7 (QCA 988X) and it errors out with:
>>> 
>>> ath10k_pci  failed to increase tx pending count: -16, dropping
>>> 
>>> This is with ath10k non-ct firmware and driver on OpenWrt (w/backports 5.4.27 & AQL). ath10k starts up but is unable to let any stations connect.
>>> 
>>> I've also tried the standalone patch by Ben and it seems to have reduced the latencies on top of AQL by another 5 ms.
>> Hello,
>> Did you notice any throughput changes or system load changes in the test that you did with my patch?
>> Thanks,
>> Ben
> 
> I have noticed that there has been a reduction in system load and memory use. Whereas previously with 11 clients on one Archer C7 acting as an AP only, my free memory available would be 51%, it now shows 55-56% - a 4% to 5% reduction. Note, these results were obtained alongside with reverting the following comit https://github.com/openwrt/openwrt/commit/1e27befe63ff4c69f110c7310316f4af75ee63e9. I believe that this same set of patches also are currently in use for the ath10k-ct smallbuffers variant on OpenWrt.
> 
> I have not had the time to conduct any meaningful throughput measurements.
> 
>>> 
>>> On 4/29/20 4:39 AM, Toke Høiland-Jørgensen wrote:
>>>> Ben Greear <greearb@xxxxxxxxxxxxxxx> writes:
>>>> 
>>>>> On 04/28/2020 12:37 PM, Toke Høiland-Jørgensen wrote:
>>>>>> greearb@xxxxxxxxxxxxxxx writes:
>>>>>> 
>>>>>>> From: Ben Greear <greearb@xxxxxxxxxxxxxxx>
>>>>>>> 
>>>>>>> While running tcp upload + download tests with ~200
>>>>>>> concurrent TCP streams, 1-2 processes, and 30 station
>>>>>>> vdevs, I noticed that the __ieee80211_stop_queue was taking
>>>>>>> around 20% of the CPU according to perf-top, which other locking
>>>>>>> taking an additional ~15%.
>>>>>>> 
>>>>>>> I believe the issue is that the ath10k driver would unlock the
>>>>>>> txqueue when a single frame could be transmitted, instead of
>>>>>>> waiting for a low water mark.
>>>>>>> 
>>>>>>> So, this patch adds a low-water mark that is 1/4 of the total
>>>>>>> tx buffers allowed.
>>>>>>> 
>>>>>>> This appears to resolve the performance problem that I saw.
>>>>>>> 
>>>>>>> Tested with recent wave-1 ath10k-ct firmware.
>>>>>>> 
>>>>>>> Signed-off-by: Ben Greear <greearb@xxxxxxxxxxxxxxx>
>>>>>>> ---
>>>>>>>   drivers/net/wireless/ath/ath10k/htt.h    | 1 +
>>>>>>>   drivers/net/wireless/ath/ath10k/htt_tx.c | 8 ++++++--
>>>>>>>   2 files changed, 7 insertions(+), 2 deletions(-)
>>>>>>> 
>>>>>>> diff --git a/drivers/net/wireless/ath/ath10k/htt.h b/drivers/net/wireless/ath/ath10k/htt.h
>>>>>>> index 31c4ddbf45cb..b5634781c0dc 100644
>>>>>>> --- a/drivers/net/wireless/ath/ath10k/htt.h
>>>>>>> +++ b/drivers/net/wireless/ath/ath10k/htt.h
>>>>>>> @@ -1941,6 +1941,7 @@ struct ath10k_htt {
>>>>>>> 
>>>>>>>       u8 target_version_major;
>>>>>>>       u8 target_version_minor;
>>>>>>> +    bool needs_unlock;
>>>>>>>       struct completion target_version_received;
>>>>>>>       u8 max_num_amsdu;
>>>>>>>       u8 max_num_ampdu;
>>>>>>> diff --git a/drivers/net/wireless/ath/ath10k/htt_tx.c b/drivers/net/wireless/ath/ath10k/htt_tx.c
>>>>>>> index 9b3c3b080e92..44795d9a7c0c 100644
>>>>>>> --- a/drivers/net/wireless/ath/ath10k/htt_tx.c
>>>>>>> +++ b/drivers/net/wireless/ath/ath10k/htt_tx.c
>>>>>>> @@ -145,8 +145,10 @@ void ath10k_htt_tx_dec_pending(struct ath10k_htt *htt)
>>>>>>>       lockdep_assert_held(&htt->tx_lock);
>>>>>>> 
>>>>>>>       htt->num_pending_tx--;
>>>>>>> -    if (htt->num_pending_tx == htt->max_num_pending_tx - 1)
>>>>>>> +    if ((htt->num_pending_tx <= (htt->max_num_pending_tx / 4)) && htt->needs_unlock) {
>>>>>> 
>>>>>> Why /4? Seems a bit arbitrary?
>>>>> 
>>>>> Yes, arbitrary for sure. I figure restart filling the queue when 1/4
>>>>> full so that it is unlikely to run dry. Possibly it should restart
>>>>> sooner to keep it more full on average?
>>>> 
>>>> Theoretically, the "keep the queue at the lowest possible level that
>>>> keeps it from underflowing" is what BQL is supposed to do. The diff
>>>> below uses the dynamic adjustment bit (from dynamic_queue_limits.h) in
>>>> place of num_pending_tx. I've only compile tested it, and I'm a bit
>>>> skeptical that it will work right for this, but if anyone wants to give
>>>> it a shot, there it is.
>>>> 
>>>> BTW, while doing that, I noticed there's a similar arbitrary limit in
>>>> ath10k_mac_tx_push_pending() at max_num_pending_tx/2. So if you're going
>>>> to keep the arbitrary limit maybe use the same one? :)
>>>> 
>>>>> Before my patch, the behaviour would be to try to keep it as full as
>>>>> possible, as in restart the queues as soon as a single slot opens up
>>>>> in the tx queue.
>>>> 
>>>> Yeah, that seems somewhat misguided as well, from a latency perspective,
>>>> at least. But I guess that's what we're fixing with AQL. What does the
>>>> firmware do with the frames queued within? Do they just go on a FIFO
>>>> queue altogether, or something smarter?
>>>> 
>>>> -Toke
>>>> 
>>>> 
>>>> 
>>>> 
>>>> diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c
>>>> index f26cc6989dad..72771ff38a94 100644
>>>> --- a/drivers/net/wireless/ath/ath10k/core.c
>>>> +++ b/drivers/net/wireless/ath/ath10k/core.c
>>>> @@ -2497,6 +2497,10 @@ static int ath10k_core_init_firmware_features(struct ath10k *ar)
>>>>           return -EINVAL;
>>>>       }
>>>>   +    dql_init(&ar->htt.dql, HZ);
>>>> +    ar->htt.dql.max_limit = ar->htt.max_num_pending_tx;
>>>> +    ar->htt.dql.min_limit = 8;
>>>> +
>>>>       if (ar->hw_params.num_peers)
>>>>           ar->max_num_peers = ar->hw_params.num_peers;
>>>>       else
>>>> diff --git a/drivers/net/wireless/ath/ath10k/htt.h b/drivers/net/wireless/ath/ath10k/htt.h
>>>> index 4a12564fc30e..19024d063896 100644
>>>> --- a/drivers/net/wireless/ath/ath10k/htt.h
>>>> +++ b/drivers/net/wireless/ath/ath10k/htt.h
>>>> @@ -13,6 +13,7 @@
>>>>   #include <linux/dmapool.h>
>>>>   #include <linux/hashtable.h>
>>>>   #include <linux/kfifo.h>
>>>> +#include <linux/dynamic_queue_limits.h>
>>>>   #include <net/mac80211.h>
>>>>     #include "htc.h"
>>>> @@ -1965,8 +1966,8 @@ struct ath10k_htt {
>>>>       /* Protects access to pending_tx, num_pending_tx */
>>>>       spinlock_t tx_lock;
>>>>       int max_num_pending_tx;
>>>> -    int num_pending_tx;
>>>>       int num_pending_mgmt_tx;
>>>> +    struct dql dql;
>>>>       struct idr pending_tx;
>>>>       wait_queue_head_t empty_tx_wq;
>>>>   diff --git a/drivers/net/wireless/ath/ath10k/htt_tx.c b/drivers/net/wireless/ath/ath10k/htt_tx.c
>>>> index e9d12ea708b6..911a79470bdf 100644
>>>> --- a/drivers/net/wireless/ath/ath10k/htt_tx.c
>>>> +++ b/drivers/net/wireless/ath/ath10k/htt_tx.c
>>>> @@ -144,8 +144,8 @@ void ath10k_htt_tx_dec_pending(struct ath10k_htt *htt)
>>>>   {
>>>>       lockdep_assert_held(&htt->tx_lock);
>>>>   -    htt->num_pending_tx--;
>>>> -    if (htt->num_pending_tx == htt->max_num_pending_tx - 1)
>>>> +    dql_completed(&htt->dql, 1);
>>>> +    if (dql_avail(&htt->dql) > 0)
>>>>           ath10k_mac_tx_unlock(htt->ar, ATH10K_TX_PAUSE_Q_FULL);
>>>>   }
>>>>   @@ -153,11 +153,11 @@ int ath10k_htt_tx_inc_pending(struct ath10k_htt *htt)
>>>>   {
>>>>       lockdep_assert_held(&htt->tx_lock);
>>>>   -    if (htt->num_pending_tx >= htt->max_num_pending_tx)
>>>> +    if (dql_avail(&htt->dql) <= 0)
>>>>           return -EBUSY;
>>>>   -    htt->num_pending_tx++;
>>>> -    if (htt->num_pending_tx == htt->max_num_pending_tx)
>>>> +    dql_queued(&htt->dql, 1);
>>>> +    if (dql_avail(&htt->dql) <= 0)
>>>>           ath10k_mac_tx_lock(htt->ar, ATH10K_TX_PAUSE_Q_FULL);
>>>>         return 0;
>>>> diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
>>>> index 2d03b8dd3b8c..1fe251742b0a 100644
>>>> --- a/drivers/net/wireless/ath/ath10k/mac.c
>>>> +++ b/drivers/net/wireless/ath/ath10k/mac.c
>>>> @@ -3998,7 +3998,7 @@ static bool ath10k_mac_tx_can_push(struct ieee80211_hw *hw,
>>>>       if (ar->htt.tx_q_state.mode == HTT_TX_MODE_SWITCH_PUSH)
>>>>           return true;
>>>>   -    if (ar->htt.num_pending_tx < ar->htt.tx_q_state.num_push_allowed)
>>>> +    if (dql_avail(&ar->htt.dql) < ar->htt.tx_q_state.num_push_allowed)
>>>>           return true;
>>>>         if (artxq->num_fw_queued < artxq->num_push_allowed)
>>>> @@ -4159,7 +4159,7 @@ void ath10k_mac_tx_push_pending(struct ath10k *ar)
>>>>       if (ar->htt.tx_q_state.mode != HTT_TX_MODE_SWITCH_PUSH)
>>>>           return;
>>>>   -    if (ar->htt.num_pending_tx >= (ar->htt.max_num_pending_tx / 2))
>>>> +    if (dql_avail(&ar->htt.dql) < (ar->htt.dql.limit / 2))
>>>>           return;
>>>>         rcu_read_lock();
>>>> @@ -7160,7 +7160,7 @@ void ath10k_mac_wait_tx_complete(struct ath10k *ar)
>>>>               bool empty;
>>>>                 spin_lock_bh(&ar->htt.tx_lock);
>>>> -            empty = (ar->htt.num_pending_tx == 0);
>>>> +            empty = (ar->htt.dql.num_completed == ar->htt.dql.num_queued);
>>>>               spin_unlock_bh(&ar->htt.tx_lock);
>>>>                 skip = (ar->state == ATH10K_STATE_WEDGED) ||
>>>> diff --git a/drivers/net/wireless/ath/ath10k/txrx.c b/drivers/net/wireless/ath/ath10k/txrx.c
>>>> index 39abf8b12903..fe7cd53c2bf9 100644
>>>> --- a/drivers/net/wireless/ath/ath10k/txrx.c
>>>> +++ b/drivers/net/wireless/ath/ath10k/txrx.c
>>>> @@ -80,7 +80,7 @@ int ath10k_txrx_tx_unref(struct ath10k_htt *htt,
>>>>         ath10k_htt_tx_free_msdu_id(htt, tx_done->msdu_id);
>>>>       ath10k_htt_tx_dec_pending(htt);
>>>> -    if (htt->num_pending_tx == 0)
>>>> +    if (htt->dql.num_completed == htt->dql.num_queued)
>>>>           wake_up(&htt->empty_tx_wq);
>>>>       spin_unlock_bh(&htt->tx_lock);
>>>> 
>>> 





[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Wireless Personal Area Network]     [Linux Bluetooth]     [Wireless Regulations]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Hiking]     [MIPS Linux]     [ARM Linux]     [Linux RAID]

  Powered by Linux