Re: Limit dentry cache entries

Dave Chinner <david@xxxxxxxxxxxxx> · Mon, 27 May 2013 09:23:52 +1000

On Fri, May 24, 2013 at 11:12:50PM -0400, Keyur Govande wrote:
> On Mon, May 20, 2013 at 6:53 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > On Sun, May 19, 2013 at 11:50:55PM -0400, Keyur Govande wrote:
> >> Hello,
> >>
> >> We have a bunch of servers that create a lot of temp files, or check
> >> for the existence of non-existent files. Every such operation creates
> >> a dentry object and soon most of the free memory is consumed for
> >> 'negative' dentry entries. This behavior was observed on both CentOS
> >> kernel v.2.6.32-358 and Amazon Linux kernel v.3.4.43-4.
> >>
> >> There are also some processes running that occasionally allocate large
> >> chunks of memory, and when this happens the kernel clears out a bunch
> >> of stale dentry caches. This clearing takes some time. kswapd kicks
> >> in, and allocations and bzero() of 4GB that normally takes <1s, takes
> >> 20s or more.
> >>
> >> Because the memory needs are non-continuous but negative dentry
> >> generation is fairly continuous, vfs_cache_pressure doesn't help much.
> >>
> >> The thought I had was to have a sysctl that limits the number of
> >> dentries per super-block (sb-max-dentry). Everytime a new dentry is
> >> allocated in d_alloc(), check if dentry_stat.nr_dentry exceeds (number
> >> of super blocks * sb-max-dentry). If yes, queue up an asynchronous
> >> workqueue call to prune_dcache(). Also have a separate sysctl to
> >> indicate by what percentage to reduce the dentry entries when this
> >> happens.
> >
> > This request does come up every so often. There are valid reasons
> > for being able to control the exact size of the dentry and page
> > caches - I've seen a few implementations in storage appliance
> > vendor kernels where total control of memory usage yields a few
> > percent better performance of industry specific benchmarks. Indeed,
> > years ago I thought that capping the size of the dnetry cache was a
> > good idea, too.
> >
> > However, the problem that I've seen with every single on of these
> > implementations is that the limit is carefully tuned for best all
> > round performance in a given set of canned workloads. When the limit
> > is wrong, performance tanks, and it is just about impossible to set
> > a limit correctly for a machine that has a changing workload.
> >
> > If your problem is negative dentries building up, where do you set
> > the limit? Set it low enough to keep only a small number of total
> > dentries to keep the negative dentries down, and you'll end up
> > with a dentry cache that isn't big enough to hold all th dentries
> > needed for efficient performance with workloads that do directory
> > traversals. It's a two-edged sword, and most people do not have
> > enough knowledge to tune a knob correctly.
> >
> > IOWs, the automatic sizing of the dentry cache based on memory
> > pressure is the correct thing to do. Capping it, or allowing it to
> > be capped will simply generate bug reports for strange performance
> > problems....
> >
> > That said, keeping lots of negative dentries around until memory
> > pressure kicks them out is probably the wrong thing to do. Negative
> > dentries are an optimisation for some workloads, but they tend to
> > have references to negative dentries with a temporal locality that
> > matches the unlink time.
> >
> > Perhaps we need to separately reclaim negative dentries i.e. not
> > wait for memory pressure to reclaim them but use some other kind of
> > trigger for reclamation. That doesn't cap the size of the dentry
> > cache, but would address the problem of negative dentry buildup....
> >
> > Cheers,
> >
> > Dave.
> > --
> > Dave Chinner
> > david@xxxxxxxxxxxxx
> 
> Hi Dave,
> 
> Thank you for responding. Sorry it took so long for me to get back,
> been a bit busy.
> 
> I do agree that having a knob, and then setting a bad value can tank
> performance. But not having a knob IMO is worse.  Currently there are
> no options for controlling the cache, bar dropping the caches
> altogether every so often. The knob would have a default value of
> ((unsigned long) -1)), so if one does not care for it, they would
> experience the same behavior as today.

And therein lies the problem with a knob. What's the point of having
a knob that nobody but a handful of people know what it does or
evenhow to recognise when they need to tweak it. It's long been a
linux kernel policy that the kernel should do the right thing by
default. As such, knobs to tweak things are a last resort.

> Also, setting a bad value for the knob would negatively impact file-IO
> performance, which on a spinning disk isn't guaranteed anyway. The
> current situation tanks memory performance which is more unexpected to
> a normal user.

Which is precisely why a knob is the wrong solution. If it's
something a normal, unsuspecting user has problems with, then it
needs to be handled automatically by the kernel. Expecting users who
don't even know what a dentry is to know about a magic knob that
fixes a problem they don't even know they have is not an acceptable
solution.

The first step to solving such a problem is to provide a
reproducable, measurable test case in a simple script that
demonstrates the problem that needs solving. If we can reproduce it
at will, then half the battle is already won....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html