On Mon, May 20, 2013 at 6:53 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > On Sun, May 19, 2013 at 11:50:55PM -0400, Keyur Govande wrote: >> Hello, >> >> We have a bunch of servers that create a lot of temp files, or check >> for the existence of non-existent files. Every such operation creates >> a dentry object and soon most of the free memory is consumed for >> 'negative' dentry entries. This behavior was observed on both CentOS >> kernel v.2.6.32-358 and Amazon Linux kernel v.3.4.43-4. >> >> There are also some processes running that occasionally allocate large >> chunks of memory, and when this happens the kernel clears out a bunch >> of stale dentry caches. This clearing takes some time. kswapd kicks >> in, and allocations and bzero() of 4GB that normally takes <1s, takes >> 20s or more. >> >> Because the memory needs are non-continuous but negative dentry >> generation is fairly continuous, vfs_cache_pressure doesn't help much. >> >> The thought I had was to have a sysctl that limits the number of >> dentries per super-block (sb-max-dentry). Everytime a new dentry is >> allocated in d_alloc(), check if dentry_stat.nr_dentry exceeds (number >> of super blocks * sb-max-dentry). If yes, queue up an asynchronous >> workqueue call to prune_dcache(). Also have a separate sysctl to >> indicate by what percentage to reduce the dentry entries when this >> happens. > > This request does come up every so often. There are valid reasons > for being able to control the exact size of the dentry and page > caches - I've seen a few implementations in storage appliance > vendor kernels where total control of memory usage yields a few > percent better performance of industry specific benchmarks. Indeed, > years ago I thought that capping the size of the dnetry cache was a > good idea, too. > > However, the problem that I've seen with every single on of these > implementations is that the limit is carefully tuned for best all > round performance in a given set of canned workloads. When the limit > is wrong, performance tanks, and it is just about impossible to set > a limit correctly for a machine that has a changing workload. > > If your problem is negative dentries building up, where do you set > the limit? Set it low enough to keep only a small number of total > dentries to keep the negative dentries down, and you'll end up > with a dentry cache that isn't big enough to hold all th dentries > needed for efficient performance with workloads that do directory > traversals. It's a two-edged sword, and most people do not have > enough knowledge to tune a knob correctly. > > IOWs, the automatic sizing of the dentry cache based on memory > pressure is the correct thing to do. Capping it, or allowing it to > be capped will simply generate bug reports for strange performance > problems.... > > That said, keeping lots of negative dentries around until memory > pressure kicks them out is probably the wrong thing to do. Negative > dentries are an optimisation for some workloads, but they tend to > have references to negative dentries with a temporal locality that > matches the unlink time. > > Perhaps we need to separately reclaim negative dentries i.e. not > wait for memory pressure to reclaim them but use some other kind of > trigger for reclamation. That doesn't cap the size of the dentry > cache, but would address the problem of negative dentry buildup.... > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx Hi Dave, Thank you for responding. Sorry it took so long for me to get back, been a bit busy. I do agree that having a knob, and then setting a bad value can tank performance. But not having a knob IMO is worse. Currently there are no options for controlling the cache, bar dropping the caches altogether every so often. The knob would have a default value of ((unsigned long) -1)), so if one does not care for it, they would experience the same behavior as today. Also, setting a bad value for the knob would negatively impact file-IO performance, which on a spinning disk isn't guaranteed anyway. The current situation tanks memory performance which is more unexpected to a normal user. Thanks, Keyur. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html