On Wed, May 23, 2018 at 06:25:49PM +0200, Erik Stromdahl wrote: > > > On 05/22/2018 11:15 PM, Niklas Cassel wrote: > > <snip> > > > > > > Earlier we observed performance issues in calling push_pending from each > > > tx completion. IMHO this change may introduce the same problem again. > > > > I prefer functional TX over performance issues, > > but I agree that it is unfortunate that SDIO doesn't use > > ath10k_htt_txrx_compl_task(). > > Erik, is there a reason for this? > The reason is that the SDIO code has been derived mainly from qcacld and ath6kl > and they don't implement napi. > > ath10k_htt_txrx_compl_task is currently only called from the napi poll function, > and the sdio bus driver doesn't have such a function. Ok, thanks for the explanation. Perhaps we can change the SDIO code so that it uses NAPI in the future. <snip> > > Another solution might be to change so that we only call > > ath10k_mac_tx_push_pending() from ath10k_txrx_tx_unref() > > if (htt->num_pending_tx == 0). That should decrease the number > > of calls to ath10k_mac_tx_push_pending(), while still avoiding > > a "TX deadlock" scenario for SDIO. > Just out of curiosity, where did the limit of 3 come from? > If it works with a limit of 0, I think it should be used instead. It came from mt76_txq_schedule(): if (hwq->swq_queued >= 4 || list_empty(&hwq->swq)) break; len = mt76_txq_schedule_list(dev, hwq); Since this used a break, I simply inverted the logic, and called ath10k_mac_tx_push_pending() rather than mt76_txq_schedule_list(). However, I've submitted a V4 now that mimics the behavior in ath10k_htt_txrx_compl_task() instead, so now I call ath10k_mac_tx_push_pending() regardless of num_pending_tx. In most cases, ath10k_mac_tx_push_pending() will not dequeue any frames, since the ar->txqs list will be empty, so this shouldn't be so bad after all. > > Another intersting thing that I stumbled upon when looking into the > code (while writing this email) is the *wake_up(&htt->empty_tx_wq);* > > For some reason I have considered it not to be applicable for HL devices. > > The queue is waited for in the flush op (*ath10k_flush*). > I am unsure what it is used for, but I don't think it affects the TX > deadlock scenario. It seems to be called by mac80211 in certain scenarios, but like you said, it doesn't help with this problem. Regards, Niklas