Hi Felix, Thanks again for patiently discussing things. I clearly understand what you said about locks. Now that we know the background of the problem, do you suggest any potential solution or any potential direction that I should start investigating? Please let me know. Thanks & Regards Venkat On Wed, 21 Sept 2022 at 13:35, Felix Fietkau <nbd@xxxxxxxx> wrote: > > > On 20.09.22 21:23, Venkat Ch wrote: > > Hi Felix, > > > > Following is the background of the problem, how I traced to > > mutex_lock and why I propose rcu locks. > > > > Issue: > > On a 10Mbps upload / 50 Mbps download connection, the following issue reported. > > > > Video periodically freezes and/or appears delayed when on Zoom or Teams. > > 1. Video will freeze for 10 or 15 seconds periodically when on a call, > > but audio continues and the session doesn't drop. > > 2. The video still works but it appears delay (I move, but the video > > of my movement is noticeably delay by a second or so) > > > > Tracing to mutex_lock(sta_mutex): > > > > When I investigated, I found that the ucentral agent in openwifi > > fetches the station list periodically. Without the station list > > fetch, the video quality is just fine. I investigated the station list > > path and found this mutex_lock. I also see that earlier it was > > rcu_lock which protected the station list in this path. In this > > commit, https://github.com/torvalds/linux/commit/66572cfc30a4b764150c83ee5d842a3ce17991c9, > > rcu lock was changed to mutex lock without providing any reason. > The reason seems clear to me, even though it was not explicitly stated > in the commit message: in sta_set_sinfo it introduces a call to a driver > op that is allowed to sleep. > > > I also saw this comment just above the sta_mutex declaration. > > /* Station data */ > > /* > > * The mutex only protects the list, hash table and > > * counter, reads are done with RCU. > > */ > > struct mutex sta_mtx; > > > > So I reverted back the mutex_lock with rcu_lock and it just worked > > fine. We tested for more than 2 weeks before concluding this analysis. > > > > I think the usage of mutex_lock is impacting the tx / rx path > > somewhere and hence the issue. It is a challenge to trace the exact > > function though. > > I don't see any critical part in the tx/rx path which depends on the > sta_mtx lock. My guess is that for some reason your change is simply > accidentally papering over the real bug, which may be somewhere else > entirely, maybe even in the driver. A freeze for 10-15 seconds > definitely does not sound like simple lock contention on the mutex, > since a single station dump will be much faster than that. > > - Felix -- If you rest, you rust