On Thu 12-03-20 15:32:38, Andrew Morton wrote: > On Thu, 12 Mar 2020 11:07:15 -0700 (PDT) David Rientjes <rientjes@xxxxxxxxxx> wrote: > > > On Thu, 12 Mar 2020, Tetsuo Handa wrote: > > > > > > On Thu, 12 Mar 2020, Tetsuo Handa wrote: > > > > > > If you have an alternate patch to try, we can test it. But since this > > > > > > cond_resched() is needed anyway, I'm not sure it will change the result. > > > > > > > > > > schedule_timeout_killable(1) is an alternate patch to try; I don't think > > > > > that this cond_resched() is needed anyway. > > > > > > > > > > > > > You are suggesting schedule_timeout_killable(1) in shrink_node_memcgs()? > > > > > > > > > > Andrew Morton also mentioned whether cond_resched() in shrink_node_memcgs() > > > is enough. But like you mentioned, > > > > > > > It passes our testing because this is where the allocator is looping while > > the victim is trying to exit if only it could be scheduled. > > What happens if the allocator has SCHED_FIFO? The same thing as a SCHED_FIFO running in a tight loop in the userspace. As long as a high priority context depends on a resource held by a low priority task then we have a priority inversion problem and the page allocator is no real exception here. But I do not see the allocator is much different from any other code in the kernel. We do not add random sleeps here and there to push a high priority FIFO or RT tasks out of the execution context. We do cond_resched to help !PREEMPT kernels but priority related issues are really out of scope of that facility. -- Michal Hocko SUSE Labs