On Tue, 22 Feb 2022 11:47:01 -0800 Tim Murray wrote: > On Mon, Feb 21, 2022 at 12:55 AM Michal Hocko <mhocko@xxxxxxxx> wrote: > > It would be cool to have some numbers here. > > Are there any numbers beyond what Suren mentioned that would be > useful? As one example, in a trace of a camera workload that I opened > at random to check for drain_local_pages stalls, I saw the kworker > that ran drain_local_pages stay at runnable for 68ms before getting > any CPU time. I could try to query our trace corpus to find more > examples, but they're not hard to find in individual traces already. > > > If the draining is too slow and dependent on the current CPU/WQ > > contention then we should address that. The original intention was that > > having a dedicated WQ with WQ_MEM_RECLAIM would help to isolate the > > operation from the rest of WQ activity. Maybe we need to fine tune > > mm_percpu_wq. If that doesn't help then we should revise the WQ model > > and use something else. Memory reclaim shouldn't really get stuck behind > > other unrelated work. > > In my experience, workqueues are easy to misuse and should be > approached with a lot of care. For many workloads, they work fine 99%+ > of the time, but once you run into problems with scheduling delays for > that workqueue, the only option is to stop using workqueues. If you > have work that is system-initiated with minimal latency requirements > (eg, some driver heartbeat every so often, devfreq governors, things > like that), workqueues are great. If you have userspace-initiated work > that should respect priority (eg, GPU command buffer submission in the > critical path of UI) or latency-critical system-initiated work (eg, > display synchronization around panel refresh), workqueues are the > wrong choice because there is no RT capability. WQ_HIGHPRI has a minor > impact, but it won't solve the fundamental problem if the system is > under heavy enough load or if RT threads are involved. As Petr > mentioned, the best solution for those cases seems to be "convert the > workqueue to an RT kthread_worker." I've done that many times on many > different Android devices over the years for latency-critical work, > especially around GPU, display, and camera. Feel free to list the URLs to the latency-critical works as I want to learn the reasons why workqueue failed to fit in the scenarios. > > In the drain_local_pages case, I think it is triggered by userspace > work and should respect priority; I don't think a prio 50 RT task > should be blocked waiting on a prio 120 (or prio 100 if WQ_HIGHPRI) > kworker to be scheduled so it can run drain_local_pages. If that's a > reasonable claim, then I think moving drain_local_pages away from > workqueues is the best choice. A prio-50 direct reclaimer implies design failure in 99.1% products.