Re: [PATCH] Avoid useless inodes and dentries reclamation

Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx> · Thu, 29 Aug 2013 11:07:56 -0700

> > Signed-off-by: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>
> > ---
> >  fs/super.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/fs/super.c b/fs/super.c
> > index 68307c0..70fa26c 100644
> > --- a/fs/super.c
> > +++ b/fs/super.c
> > @@ -53,6 +53,7 @@ static char *sb_writers_name[SB_FREEZE_LEVELS] = {
> >   * shrinker path and that leads to deadlock on the shrinker_rwsem. Hence we
> >   * take a passive reference to the superblock to avoid this from occurring.
> >   */
> > +#define SB_CACHE_LOW 5
> >  static int prune_super(struct shrinker *shrink, struct shrink_control *sc)
> >  {
> >  	struct super_block *sb;
> > @@ -68,6 +69,13 @@ static int prune_super(struct shrinker *shrink, struct shrink_control *sc)
> >  	if (sc->nr_to_scan && !(sc->gfp_mask & __GFP_FS))
> >  		return -1;
> >  
> > +	/*
> > +	 * Don't prune if we have few cached objects to reclaim to
> > +	 * avoid useless sb_lock contention
> > +	 */
> > +	if ((sb->s_nr_dentry_unused + sb->s_nr_inodes_unused) <= SB_CACHE_LOW)
> > +		return -1;
> 
> Those counters no longer exist in the current mmotm tree and the
> shrinker infrastructure is somewhat different, so this patch isn't
> the right way to solve this problem.

These changes in mmotm tree do complicate solutions for this
scalability issue.

> 
> Given that superblock LRUs and shrinkers in mmotm are node aware,
> there may even be more pressure on the sblock in such a workload.  I
> think the right way to deal with this is to give the shrinker itself
> a "minimum call count" so that we can avoid even attempting to
> shrink caches that does have enough entries in them to be worthwhile
> shrinking.

By "minimum call count", do you mean tracking the number of free
entries per node in the shrinker, and invoking shrinker 
only when the number of free entries
exceed "minimum call count"?  There is some cost in
list_lru_count_node to get the free entries, as we need
to acquire the node's lru lock.  Alternatively, we can
set a special flag/node by list_add or list_del when count goes
above/below a threshold and invoke shrinker based on this flag.

Or do you mean that if we do not reap any memory in a shrink
operation, we do a certain number of backoffs of shrink operation
till the "minimum call count" is reached?

> 
> That said, the memcg guys have been saying that even small numbers
> of items per cache can be meaningful in terms of memory reclaim
> (e.g. when there are lots of memcgs) then such a threshold might
> only be appropriate for caches that are not memcg controlled. 

I've done some experiment with the CACHE thresholds.  Even setting
the threshold at 0 (i.e. there're no free entries) remove almost all 
the needless contentions.  That should make the memcg guys happy by
not holding any extra free entries.

> In
> that case, handling it in the shrinker infrastructure itself is a
> much better idea than hacking thresholds into individual shrinker
> callouts.

Currently the problem is mostly with the sb shrinker due to the
sb_lock.  If we can have a general solution, that will be even
better.

Thanks.

Tim

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html