Hello, Michal. On Mon, Jul 30, 2018 at 08:51:10PM +0200, Michal Hocko wrote: > > Yeah, workqueue can choke on things like that and kthread indefinitely > > busy looping doesn't do anybody any good. > > Yeah, I do agree. But this is much easier said than done ;) Sure > we have that hack that does sleep rather than cond_resched in the > page allocator. We can and will "fix" it to be unconditional in the > should_reclaim_retry [1] but this whole thing is really subtle. It just > take one misbehaving worker and something which is really important to > run will get stuck. Oh yeah, I'm not saying the current behavior is ideal or anything, but since the behavior has been put in many years ago, it only became a problem only a couple times and all cases were rather easy and obvious fixes on the wq user side. It shouldn't be difficult to add a timer mechanism on top. We might be able to simply extend the hang detection mechanism to kick off all pending rescuers after detecting a wq stall. I'm wary about making it a part of normal operation (ie. silent timeout). per-cpu kworkers really shouldn't busy loop for an extended period of time. Thanks. -- tejun