Shakeel Butt reported, he have observed in production system that the job loader gets stuck for 10s of seconds while doing mount operation. It turns out that it was stuck in register_shrinker() and some unrelated job was under memory pressure and spending time in shrink_slab(). Machines have a lot of shrinkers registered and jobs under memory pressure has to traverse all of those memcg-aware shrinkers and do affect unrelated jobs which want to register their own shrinkers. To solve the issue, this patch simply bails out slab shrinking once it found someone want to register shrinker in parallel. A downside is it could cause unfair shrinking between shrinkers. However, it should be rare and we can add compilcated logic once we found it's not enough. Link: http://lkml.kernel.org/r/20171115005602.GB23810@bbox Cc: Michal Hocko <mhocko@xxxxxxxx> Cc: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx> Reported-and-tested-by: Shakeel Butt <shakeelb@xxxxxxxxxx> Signed-off-by: Shakeel Butt <shakeelb@xxxxxxxxxx> Signed-off-by: Minchan Kim <minchan@xxxxxxxxxx> --- mm/vmscan.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/mm/vmscan.c b/mm/vmscan.c index 6a5a72baccd5..6698001787bd 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -486,6 +486,14 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid, sc.nid = 0; freed += do_shrink_slab(&sc, shrinker, priority); + /* + * bail out if someone want to register a new shrinker to + * prevent long time stall by parallel ongoing shrinking. + */ + if (rwsem_is_contended(&shrinker_rwsem)) { + freed = freed ? : 1; + break; + } } up_read(&shrinker_rwsem); -- 2.7.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>