On Fri, Jul 29, 2011 at 05:44:17PM +0400, Glauber Costa wrote: > This patch lays the foundation for us to limit the dcache size. > Each super block can have only a maximum amount of dentries under its > sub-tree. Allocation fails if we we're over limit and the cache > can't be pruned to free up space for the newcomers. > > Signed-off-by: Glauber Costa <glommer@xxxxxxxxxxxxx> > CC: Dave Chinner <david@xxxxxxxxxxxxx> > --- > fs/dcache.c | 45 +++++++++++++++++++++++++++++++++++++-------- > fs/super.c | 1 + > include/linux/fs.h | 1 + > 3 files changed, 39 insertions(+), 8 deletions(-) > > diff --git a/fs/dcache.c b/fs/dcache.c > index 9cb6395..3bdb106 100644 > --- a/fs/dcache.c > +++ b/fs/dcache.c > @@ -663,7 +663,7 @@ EXPORT_SYMBOL(d_prune_aliases); > * > * This may fail if locks cannot be acquired no problem, just try again. > */ > -static void try_prune_one_dentry(struct dentry *dentry) > +static int try_prune_one_dentry(struct dentry *dentry) > __releases(dentry->d_lock) > { > struct dentry *parent; Comment explaining the return value? [...] > @@ -740,7 +744,7 @@ static void shrink_dentry_list(struct list_head *list) > * > * If flags contains DCACHE_REFERENCED reference dentries will not be pruned. > */ > -static void __shrink_dcache_sb(struct super_block *sb, int count, int flags) > +static int __shrink_dcache_sb(struct super_block *sb, int count, int flags) > { > struct dentry *dentry; > LIST_HEAD(referenced); What's the new return value do? > @@ -781,7 +785,7 @@ relock: > list_splice(&referenced, &sb->s_dentry_lru); > spin_unlock(&dcache_lru_lock); > > - shrink_dentry_list(&tmp); > + return shrink_dentry_list(&tmp); > } > > /** > @@ -1176,6 +1180,28 @@ void shrink_dcache_parent(struct dentry * parent) > } > EXPORT_SYMBOL(shrink_dcache_parent); > > +static int dcache_mem_check(struct super_block *sb) > +{ > + int nr_dentry; > + int count; > + > + if (percpu_counter_read_positive(&sb->s_nr_dentry) <= > + sb->s_nr_dentry_max) > + return 0; > + > + do { > + nr_dentry = percpu_counter_sum_positive(&sb->s_nr_dentry); > + > + if (nr_dentry < sb->s_nr_dentry_max) > + return 0; > + > + count = (nr_dentry - sb->s_nr_dentry_max) << 1; > + > + } while (__shrink_dcache_sb(sb, count, DCACHE_REFERENCED) > 0); > + > + return -ENOMEM; > +} I don't like the idea of trying to be exact on how many we shrink. percpu_counter_read_positive() will always have an error in it: +/-(N cpus * 31) by default. Statistically speaking 50% of the cases it passes will then get rejected on the expensive sum operation. And then we shrink only 2x the difference between the counter and the max value, which will leave it inside the error margin of the counter anyway. Hence once we hit the threshold, we'll continually take expensive sum operations, many for no reason. There needs to be some hysteresis here to avoid simply hammering the percpu counter on every single __d_alloc call. Perhaps per-cpu counters are not suited for this purpose? Indeed, what happens if one caller saw the read and sum over the limit, then starts shrinking but other callers don't see the read over the limit but keep allocating dentries and hence keeping the shrinking caller's sum over the limit and hence forever shrinking? That seems like a unprivileged random process DOS, yes? So at minimum, the loop should not recalculate how much to shrink - calculate the amount to shrink, the run a loop to shrink that much and only that much. If there are other concurrent callers, then they'll also do a shrink, but nothing will get stuck in a loop. Hmmm, what happens when the size of the cache and/or the max value are less than the error margin of the counter? Also, this doesn't address the problem I originally commented on - it only shrinks the dcache, not the inode or filesystem caches that hold all the memory. I mentioned the new per-sb shrinkers for a reason - you should be calling them, not __shrink_dcache_sb(). i.e. building a scan_control structure and calling sb->shrinker.shrink(&sb->shrinker, &sc); I'm seriously starting to think that this needs to share an implementation with the mm shrinker code, because it already solves most of these problems... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html