On Wed, Nov 15, 2017 at 07:58:09PM +0900, Tetsuo Handa wrote: > I think that Minchan's approach depends on how > > In our production, we have observed that the job loader gets stuck for > 10s of seconds while doing mount operation. It turns out that it was > stuck in register_shrinker() and some unrelated job was under memory > pressure and spending time in shrink_slab(). Our machines have a lot > of shrinkers registered and jobs under memory pressure has to traverse > all of those memcg-aware shrinkers and do affect unrelated jobs which > want to register their own shrinkers. > > is interpreted. If there were 100000 shrinkers and each do_shrink_slab() call > took 1 millisecond, aborting the iteration as soon as rwsem_is_contended() would > help a lot. But if there were 10 shrinkers and each do_shrink_slab() call took > 10 seconds, aborting the iteration as soon as rwsem_is_contended() would help > less. Or, there might be some specific shrinker where its do_shrink_slab() call > takes 100 seconds. In that case, checking rwsem_is_contended() is too lazy. In your patch, unregister() waits for shrinker->nr_active instead of the lock, which is decreased in the same location where Minchan drops the lock. How is that different behavior for long-running shrinkers? Anyway, I suspect it's many shrinkers and many concurrent invocations, so the lockbreak granularity you both chose should be fine. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>