On Mon 29-11-21 13:23:19, Matthew Wilcox wrote: > On Mon, Nov 29, 2021 at 09:39:16AM +0100, Michal Hocko wrote: > > On Fri 26-11-21 16:26:23, Hao Lee wrote: > > [...] > > > I will try Matthew's idea to use semaphore or mutex to limit the number of BE > > > jobs that are in the exiting path. This sounds like a feasible approach for > > > our scenario... > > > > I am not really sure this is something that would be acceptable. Your > > problem is resource partitioning. Papering that over by a lock is not > > the right way to go. Besides that you will likely hit a hard question on > > how many tasks to allow to run concurrently. Whatever the value some > > workload will very likely going to suffer. We cannot assume admin to > > chose the right value because there is no clear answer for that. Not to > > mention other potential problems - e.g. even more priority inversions > > etc. > > I don't see how we get priority inversions. These tasks are exiting; at > the point they take the semaphore, they should not be holding any locks. > They're holding a resource (memory) that needs to be released, but a > task wanting to acquire memory must already be prepared to sleep. At least these scenarios come to mind - a task being blocked by other lower priority tasks slowly tearing down their address space - essentially a different incarnation of the same problem this is trying to handle - a huge memory backed task waiting many for smaller ones to finish - waste of resources on properly partitioned systems. Why should somebody block tasks when they are acting on different lruvecs and cpus? -- Michal Hocko SUSE Labs