Re: Limit dentry cache entries

Keyur Govande <keyurgovande@xxxxxxxxx> · Fri, 24 May 2013 23:12:50 -0400

On Mon, May 20, 2013 at 6:53 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> On Sun, May 19, 2013 at 11:50:55PM -0400, Keyur Govande wrote:
>> Hello,
>>
>> We have a bunch of servers that create a lot of temp files, or check
>> for the existence of non-existent files. Every such operation creates
>> a dentry object and soon most of the free memory is consumed for
>> 'negative' dentry entries. This behavior was observed on both CentOS
>> kernel v.2.6.32-358 and Amazon Linux kernel v.3.4.43-4.
>>
>> There are also some processes running that occasionally allocate large
>> chunks of memory, and when this happens the kernel clears out a bunch
>> of stale dentry caches. This clearing takes some time. kswapd kicks
>> in, and allocations and bzero() of 4GB that normally takes <1s, takes
>> 20s or more.
>>
>> Because the memory needs are non-continuous but negative dentry
>> generation is fairly continuous, vfs_cache_pressure doesn't help much.
>>
>> The thought I had was to have a sysctl that limits the number of
>> dentries per super-block (sb-max-dentry). Everytime a new dentry is
>> allocated in d_alloc(), check if dentry_stat.nr_dentry exceeds (number
>> of super blocks * sb-max-dentry). If yes, queue up an asynchronous
>> workqueue call to prune_dcache(). Also have a separate sysctl to
>> indicate by what percentage to reduce the dentry entries when this
>> happens.
>
> This request does come up every so often. There are valid reasons
> for being able to control the exact size of the dentry and page
> caches - I've seen a few implementations in storage appliance
> vendor kernels where total control of memory usage yields a few
> percent better performance of industry specific benchmarks. Indeed,
> years ago I thought that capping the size of the dnetry cache was a
> good idea, too.
>
> However, the problem that I've seen with every single on of these
> implementations is that the limit is carefully tuned for best all
> round performance in a given set of canned workloads. When the limit
> is wrong, performance tanks, and it is just about impossible to set
> a limit correctly for a machine that has a changing workload.
>
> If your problem is negative dentries building up, where do you set
> the limit? Set it low enough to keep only a small number of total
> dentries to keep the negative dentries down, and you'll end up
> with a dentry cache that isn't big enough to hold all th dentries
> needed for efficient performance with workloads that do directory
> traversals. It's a two-edged sword, and most people do not have
> enough knowledge to tune a knob correctly.
>
> IOWs, the automatic sizing of the dentry cache based on memory
> pressure is the correct thing to do. Capping it, or allowing it to
> be capped will simply generate bug reports for strange performance
> problems....
>
> That said, keeping lots of negative dentries around until memory
> pressure kicks them out is probably the wrong thing to do. Negative
> dentries are an optimisation for some workloads, but they tend to
> have references to negative dentries with a temporal locality that
> matches the unlink time.
>
> Perhaps we need to separately reclaim negative dentries i.e. not
> wait for memory pressure to reclaim them but use some other kind of
> trigger for reclamation. That doesn't cap the size of the dentry
> cache, but would address the problem of negative dentry buildup....
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx

Hi Dave,

Thank you for responding. Sorry it took so long for me to get back,
been a bit busy.

I do agree that having a knob, and then setting a bad value can tank
performance. But not having a knob IMO is worse. Currently there are
no options for controlling the cache, bar dropping the caches
altogether every so often. The knob would have a default value of
((unsigned long) -1)), so if one does not care for it, they would
experience the same behavior as today.

Also, setting a bad value for the knob would negatively impact file-IO
performance, which on a spinning disk isn't guaranteed anyway. The
current situation tanks memory performance which is more unexpected to
a normal user.

Thanks,
Keyur.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html