On Sat 30-11-19 00:45:41, Pavel Tikhomirov wrote: > We have a problem that shrinker_rwsem can be held for a long time for > read in shrink_slab, at the same time any process which is trying to > manage shrinkers hangs. > > The shrinker_rwsem is taken in shrink_slab while traversing shrinker_list. > It tries to shrink something on nfs (hard) but nfs server is dead at > these moment already and rpc will never succeed. Generally any shrinker > can take significant time to do_shrink_slab, so it's a bad idea to hold > the list lock here. Yes, this is a known problem and people have already tried to address it in the past. Have you checked previous attempts? SRCU based one http://lkml.kernel.org/r/153365347929.19074.12509495712735843805.stgit@localhost.localdomain but I believe there were others (I only had this one in my notes). Please make sure to Cc Dave Chinner when posting a next version because he had some concerns about the change of the behavior. > We have a similar problem in shrink_slab_memcg, except that we are > traversing shrinker_map+shrinker_idr there. > > The idea of the patch is to inc a refcount to the chosen shrinker so it > won't disappear and release shrinker_rwsem while we are in > do_shrink_slab, after that we will reacquire shrinker_rwsem, dec > the refcount and continue the traversal. The reference count part makes sense to me. RCU role needs a better explanation. Also do you have any reason to not use completion for the final step? Openconding essentially the same concept sounds a bit awkward to me. > We also need a wait_queue so that unregister_shrinker can wait for the > refcnt to become zero. Only after these we can safely remove the > shrinker from list and idr, and free the shrinker. [...] > crash> bt ... > PID: 18739 TASK: ... CPU: 3 COMMAND: "bash" > #0 [...] __schedule at ... > #1 [...] schedule at ... > #2 [...] rpc_wait_bit_killable at ... [sunrpc] > #3 [...] __wait_on_bit at ... > #4 [...] out_of_line_wait_on_bit at ... > #5 [...] _nfs4_proc_delegreturn at ... [nfsv4] > #6 [...] nfs4_proc_delegreturn at ... [nfsv4] > #7 [...] nfs_do_return_delegation at ... [nfsv4] > #8 [...] nfs4_evict_inode at ... [nfsv4] > #9 [...] evict at ... > #10 [...] dispose_list at ... > #11 [...] prune_icache_sb at ... > #12 [...] super_cache_scan at ... > #13 [...] do_shrink_slab at ... Are NFS people aware of this? Because this is simply not acceptable behavior. Memory reclaim cannot be block indefinitely or for a long time. There must be a way to simply give up if the underlying inode cannot be reclaimed. I still have to think about the proposed solution. It sounds a bit over complicated to me. -- Michal Hocko SUSE Labs