On 12/10/19 4:20 AM, Dave Chinner wrote: > On Fri, Dec 06, 2019 at 09:11:25AM -0800, Shakeel Butt wrote: >> On Thu, Dec 5, 2019 at 6:10 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote: >>> If a shrinker is blocking for a long time, then we need to >>> work to fix the shrinker implementation because blocking is a much >>> bigger problem than just register/unregister. >>> >> >> Yes, we should be fixing the implementations of all shrinkers and yes >> it is bigger issue but we can also fix register/unregister isolation >> issue in parallel. Fixing all shrinkers would a tedious and long task >> and we should not block fixing isolation issue on it. > > "fixing all shrinkers" is a bit of hyperbole - you've identified > only one instance where blocking is causing you problems. Indeed, > most shrinkers are already non-blocking and won't cause you any > problems at all. > >>> IOWs, we already know that cycling a global rwsem on every >>> individual shrinker invocation is going to cause noticable >>> scalability problems. Hence I don't think that this sort of "cycle >>> the global rwsem faster to reduce [un]register latency" solution is >>> going to fly because of the runtime performance regressions it will >>> introduce.... >>> >> >> I agree with your scalability concern (though others would argue to >> first demonstrate the issue before adding more sophisticated scalable >> code). > > Look at the git history. We *know* this is a problem, so anyone > arguing that we have to prove it can go take a long walk of a short > plank.... > >> Most memory reclaim code is written without the performance or >> scalability concern, maybe we should switch our thinking. > > I think there's a lot of core mm and other developers that would > disagree with you there. With respect to shrinkers, we've been > directly concerned about performance and scalability of the > individual instances as well as the infrastructure for at least the > last decade.... > > Cheers, > > Dave. > Thanks a lot for your replies, now I see that the core of the problem is in nfs hang, before that I was unsure if it's OK to have such a hang in do_shrink_slab. I have a possibly bad idea on how my patch can still work. What if we use unlock/refcount way only for nfs-shrinkers? It will still give a performance penalty if one has many nfs mounts, but for those who has little number of nfs mounts the penalty would be less. And this would be a small isolation improvement for nfs users. -- Best regards, Tikhomirov Pavel Software Developer, Virtuozzo.