Re: Limit dentry cache entries

Keyur Govande <keyurgovande@xxxxxxxxx> · Tue, 28 May 2013 02:24:16 -0400



On Tue, May 28, 2013 at 2:12 AM, Keyur Govande <keyurgovande@xxxxxxxxx> wrote:
> On Sun, May 26, 2013 at 7:23 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>> On Fri, May 24, 2013 at 11:12:50PM -0400, Keyur Govande wrote:
>>> On Mon, May 20, 2013 at 6:53 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>>> > On Sun, May 19, 2013 at 11:50:55PM -0400, Keyur Govande wrote:
>>> >> Hello,
>>> >>
>>> >> We have a bunch of servers that create a lot of temp files, or check
>>> >> for the existence of non-existent files. Every such operation creates
>>> >> a dentry object and soon most of the free memory is consumed for
>>> >> 'negative' dentry entries. This behavior was observed on both CentOS
>>> >> kernel v.2.6.32-358 and Amazon Linux kernel v.3.4.43-4.
>>> >>
>>> >> There are also some processes running that occasionally allocate large
>>> >> chunks of memory, and when this happens the kernel clears out a bunch
>>> >> of stale dentry caches. This clearing takes some time. kswapd kicks
>>> >> in, and allocations and bzero() of 4GB that normally takes <1s, takes
>>> >> 20s or more.
>>> >>
>>> >> Because the memory needs are non-continuous but negative dentry
>>> >> generation is fairly continuous, vfs_cache_pressure doesn't help much.
>>> >>
>>> >> The thought I had was to have a sysctl that limits the number of
>>> >> dentries per super-block (sb-max-dentry). Everytime a new dentry is
>>> >> allocated in d_alloc(), check if dentry_stat.nr_dentry exceeds (number
>>> >> of super blocks * sb-max-dentry). If yes, queue up an asynchronous
>>> >> workqueue call to prune_dcache(). Also have a separate sysctl to
>>> >> indicate by what percentage to reduce the dentry entries when this
>>> >> happens.
>>> >
>>> > This request does come up every so often. There are valid reasons
>>> > for being able to control the exact size of the dentry and page
>>> > caches - I've seen a few implementations in storage appliance
>>> > vendor kernels where total control of memory usage yields a few
>>> > percent better performance of industry specific benchmarks. Indeed,
>>> > years ago I thought that capping the size of the dnetry cache was a
>>> > good idea, too.
>>> >
>>> > However, the problem that I've seen with every single on of these
>>> > implementations is that the limit is carefully tuned for best all
>>> > round performance in a given set of canned workloads. When the limit
>>> > is wrong, performance tanks, and it is just about impossible to set
>>> > a limit correctly for a machine that has a changing workload.
>>> >
>>> > If your problem is negative dentries building up, where do you set
>>> > the limit? Set it low enough to keep only a small number of total
>>> > dentries to keep the negative dentries down, and you'll end up
>>> > with a dentry cache that isn't big enough to hold all th dentries
>>> > needed for efficient performance with workloads that do directory
>>> > traversals. It's a two-edged sword, and most people do not have
>>> > enough knowledge to tune a knob correctly.
>>> >
>>> > IOWs, the automatic sizing of the dentry cache based on memory
>>> > pressure is the correct thing to do. Capping it, or allowing it to
>>> > be capped will simply generate bug reports for strange performance
>>> > problems....
>>> >
>>> > That said, keeping lots of negative dentries around until memory
>>> > pressure kicks them out is probably the wrong thing to do. Negative
>>> > dentries are an optimisation for some workloads, but they tend to
>>> > have references to negative dentries with a temporal locality that
>>> > matches the unlink time.
>>> >
>>> > Perhaps we need to separately reclaim negative dentries i.e. not
>>> > wait for memory pressure to reclaim them but use some other kind of
>>> > trigger for reclamation. That doesn't cap the size of the dentry
>>> > cache, but would address the problem of negative dentry buildup....
>>> >
>>> > Cheers,
>>> >
>>> > Dave.
>>> > --
>>> > Dave Chinner
>>> > david@xxxxxxxxxxxxx
>>>
>>> Hi Dave,
>>>
>>> Thank you for responding. Sorry it took so long for me to get back,
>>> been a bit busy.
>>>
>>> I do agree that having a knob, and then setting a bad value can tank
>>> performance. But not having a knob IMO is worse.  Currently there are
>>> no options for controlling the cache, bar dropping the caches
>>> altogether every so often. The knob would have a default value of
>>> ((unsigned long) -1)), so if one does not care for it, they would
>>> experience the same behavior as today.
>>
>> And therein lies the problem with a knob. What's the point of having
>> a knob that nobody but a handful of people know what it does or
>> evenhow to recognise when they need to tweak it. It's long been a
>> linux kernel policy that the kernel should do the right thing by
>> default. As such, knobs to tweak things are a last resort.
>>
>>> Also, setting a bad value for the knob would negatively impact file-IO
>>> performance, which on a spinning disk isn't guaranteed anyway. The
>>> current situation tanks memory performance which is more unexpected to
>>> a normal user.
>>
>> Which is precisely why a knob is the wrong solution. If it's
>> something a normal, unsuspecting user has problems with, then it
>> needs to be handled automatically by the kernel. Expecting users who
>> don't even know what a dentry is to know about a magic knob that
>> fixes a problem they don't even know they have is not an acceptable
>> solution.
>>
>> The first step to solving such a problem is to provide a
>> reproducable, measurable test case in a simple script that
>> demonstrates the problem that needs solving. If we can reproduce it
>> at will, then half the battle is already won....
>>
>
> Here's a simple test case: https://gist.github.com/keyurdg/5660719 to
> create a ton of dentry cache entries, and
> https://gist.github.com/keyurdg/5660723 to allocate some memory.
>
> I kicked off 3 instances of fopen in 3 different prefixed directories.
> After all the memory was filled up with dentry entries, I tried
> allocating 4GB of memory. It took ~20s. If I turned off the dentry
> generation programs and attempted to allocate 4GB again, it only took
> 2s (because the memory was already free). Here's a quick graph of this
> behavior: http://i.imgur.com/XhgX84d.png
>
> I understand that in general, the kernel should do "the right thing"
> without user input. But this seems to be a case where the user should
> be allowed input into how memory is used. After all, there are already
> lots of knobs in Linux that if set wrongly can cause user pain/bad
> performance. IMO this new knob needs the right kind of documentation,
> like suggesting the use of slabtop and perf to identify dentry as an
> issue before setting the knob.
>
> I'm also not tied to the idea of the knob being a limit on the number
> of dentry cache entries. A limit just seems easiest to administer; but
> if there are other ways of alleviating this issue, then I'd love to
> explore those as well.
>
>> Cheers,
>>
>> Dave.
>> --
>> Dave Chinner
>> david@xxxxxxxxxxxxx

Forgot to add: the only "knob" for this issue ATM is to drop the
entire cache altogether, a massive overreaction to the problem. The
dentry cache system already has all its elements in an LRU; if we did
allow setting a limit, any dropped dentries have a good chance of not
being very significant (performance-wise).
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html