> > Signed-off-by: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx> > > --- > > fs/super.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/fs/super.c b/fs/super.c > > index 68307c0..70fa26c 100644 > > --- a/fs/super.c > > +++ b/fs/super.c > > @@ -53,6 +53,7 @@ static char *sb_writers_name[SB_FREEZE_LEVELS] = { > > * shrinker path and that leads to deadlock on the shrinker_rwsem. Hence we > > * take a passive reference to the superblock to avoid this from occurring. > > */ > > +#define SB_CACHE_LOW 5 > > static int prune_super(struct shrinker *shrink, struct shrink_control *sc) > > { > > struct super_block *sb; > > @@ -68,6 +69,13 @@ static int prune_super(struct shrinker *shrink, struct shrink_control *sc) > > if (sc->nr_to_scan && !(sc->gfp_mask & __GFP_FS)) > > return -1; > > > > + /* > > + * Don't prune if we have few cached objects to reclaim to > > + * avoid useless sb_lock contention > > + */ > > + if ((sb->s_nr_dentry_unused + sb->s_nr_inodes_unused) <= SB_CACHE_LOW) > > + return -1; > > Those counters no longer exist in the current mmotm tree and the > shrinker infrastructure is somewhat different, so this patch isn't > the right way to solve this problem. These changes in mmotm tree do complicate solutions for this scalability issue. > > Given that superblock LRUs and shrinkers in mmotm are node aware, > there may even be more pressure on the sblock in such a workload. I > think the right way to deal with this is to give the shrinker itself > a "minimum call count" so that we can avoid even attempting to > shrink caches that does have enough entries in them to be worthwhile > shrinking. By "minimum call count", do you mean tracking the number of free entries per node in the shrinker, and invoking shrinker only when the number of free entries exceed "minimum call count"? There is some cost in list_lru_count_node to get the free entries, as we need to acquire the node's lru lock. Alternatively, we can set a special flag/node by list_add or list_del when count goes above/below a threshold and invoke shrinker based on this flag. Or do you mean that if we do not reap any memory in a shrink operation, we do a certain number of backoffs of shrink operation till the "minimum call count" is reached? > > That said, the memcg guys have been saying that even small numbers > of items per cache can be meaningful in terms of memory reclaim > (e.g. when there are lots of memcgs) then such a threshold might > only be appropriate for caches that are not memcg controlled. I've done some experiment with the CACHE thresholds. Even setting the threshold at 0 (i.e. there're no free entries) remove almost all the needless contentions. That should make the memcg guys happy by not holding any extra free entries. > In > that case, handling it in the shrinker infrastructure itself is a > much better idea than hacking thresholds into individual shrinker > callouts. Currently the problem is mostly with the sb shrinker due to the sb_lock. If we can have a general solution, that will be even better. Thanks. Tim -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html