> thanks for testing and for your insight into this issue. I’m happy to help! > On 2020-09-26 06:41, Georgi Valkov wrote: >> Hi Felix! >> >> With your latest suggestion, it takes between 10 and 17 hours for the connection to drop, then long five minutes to reconnect. >> Notice the order of code execution in the original code of ieee80211_sta_tx_notify(): >> probe_send_count is always cleared when ack is true. But before clearing probe_send_count, we need to check if ieee80211_is_any_nullfunc() returns true and probe_send_count > 0: if yes, we must call ieee80211_queue_work(). You cleared probe_send_count ahead of time, so the condition to run ieee80211_queue_work() will never be met. > The condition in ieee80211_sta_tx_notify may not be met, but > ieee80211_queue_work is called from ieee80211_tx_status_ext in that > case. Any idea why that is not enough? Whenever that condition is met and probe_send_count is reset there, ieee80211_queue_work() is called for sure. Could that be the problem? Because in the original code we also check the result from ieee80211_is_any_nullfunc(). We can also add a trace e.g. printk() to shine some light on the order of events, hopefully without flooding the kernel log too much and without impact on the performance? You can send me patches directly without committing to master and I will test them. >> To spare your time, I did spend one week to find the cause, then another learning every detail about the code and testing various solutions, including those you proposed. While I do not have experience with mac80211’s design, I’m quite good at preserving the exact behaviour during large scale refactoring. And in my fix I tried changing as little as possible to keep the patch small, preserving both your changes and the original design behaviour. > And the reason I'm still proposing changes to it is because your patch > does not take into account no-skb or 802.3 tx status. > I'll try to make a v3 patch that leaves more of the original code intact > at the cost of a little more duplication. Good point. You are familiar with how it should work, and I only know the changes. You can add the common code into a new function to make it easier to follow. I’m curious: the original RX code used to call ieee80211_sta_tx_notify(). Do we need to clear probe_send_count or do anything else there? I suppose you removed that because it will be performed later. Either way we should not touch that before the issue is resolved. Georgi