On Fri, May 24, 2013 at 11:12:50PM -0400, Keyur Govande wrote: > On Mon, May 20, 2013 at 6:53 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > On Sun, May 19, 2013 at 11:50:55PM -0400, Keyur Govande wrote: > >> Hello, > >> > >> We have a bunch of servers that create a lot of temp files, or check > >> for the existence of non-existent files. Every such operation creates > >> a dentry object and soon most of the free memory is consumed for > >> 'negative' dentry entries. This behavior was observed on both CentOS > >> kernel v.2.6.32-358 and Amazon Linux kernel v.3.4.43-4. > >> > >> There are also some processes running that occasionally allocate large > >> chunks of memory, and when this happens the kernel clears out a bunch > >> of stale dentry caches. This clearing takes some time. kswapd kicks > >> in, and allocations and bzero() of 4GB that normally takes <1s, takes > >> 20s or more. > >> > >> Because the memory needs are non-continuous but negative dentry > >> generation is fairly continuous, vfs_cache_pressure doesn't help much. > >> > >> The thought I had was to have a sysctl that limits the number of > >> dentries per super-block (sb-max-dentry). Everytime a new dentry is > >> allocated in d_alloc(), check if dentry_stat.nr_dentry exceeds (number > >> of super blocks * sb-max-dentry). If yes, queue up an asynchronous > >> workqueue call to prune_dcache(). Also have a separate sysctl to > >> indicate by what percentage to reduce the dentry entries when this > >> happens. > > > > This request does come up every so often. There are valid reasons > > for being able to control the exact size of the dentry and page > > caches - I've seen a few implementations in storage appliance > > vendor kernels where total control of memory usage yields a few > > percent better performance of industry specific benchmarks. Indeed, > > years ago I thought that capping the size of the dnetry cache was a > > good idea, too. > > > > However, the problem that I've seen with every single on of these > > implementations is that the limit is carefully tuned for best all > > round performance in a given set of canned workloads. When the limit > > is wrong, performance tanks, and it is just about impossible to set > > a limit correctly for a machine that has a changing workload. > > > > If your problem is negative dentries building up, where do you set > > the limit? Set it low enough to keep only a small number of total > > dentries to keep the negative dentries down, and you'll end up > > with a dentry cache that isn't big enough to hold all th dentries > > needed for efficient performance with workloads that do directory > > traversals. It's a two-edged sword, and most people do not have > > enough knowledge to tune a knob correctly. > > > > IOWs, the automatic sizing of the dentry cache based on memory > > pressure is the correct thing to do. Capping it, or allowing it to > > be capped will simply generate bug reports for strange performance > > problems.... > > > > That said, keeping lots of negative dentries around until memory > > pressure kicks them out is probably the wrong thing to do. Negative > > dentries are an optimisation for some workloads, but they tend to > > have references to negative dentries with a temporal locality that > > matches the unlink time. > > > > Perhaps we need to separately reclaim negative dentries i.e. not > > wait for memory pressure to reclaim them but use some other kind of > > trigger for reclamation. That doesn't cap the size of the dentry > > cache, but would address the problem of negative dentry buildup.... > > > > Cheers, > > > > Dave. > > -- > > Dave Chinner > > david@xxxxxxxxxxxxx > > Hi Dave, > > Thank you for responding. Sorry it took so long for me to get back, > been a bit busy. > > I do agree that having a knob, and then setting a bad value can tank > performance. But not having a knob IMO is worse. Currently there are > no options for controlling the cache, bar dropping the caches > altogether every so often. The knob would have a default value of > ((unsigned long) -1)), so if one does not care for it, they would > experience the same behavior as today. And therein lies the problem with a knob. What's the point of having a knob that nobody but a handful of people know what it does or evenhow to recognise when they need to tweak it. It's long been a linux kernel policy that the kernel should do the right thing by default. As such, knobs to tweak things are a last resort. > Also, setting a bad value for the knob would negatively impact file-IO > performance, which on a spinning disk isn't guaranteed anyway. The > current situation tanks memory performance which is more unexpected to > a normal user. Which is precisely why a knob is the wrong solution. If it's something a normal, unsuspecting user has problems with, then it needs to be handled automatically by the kernel. Expecting users who don't even know what a dentry is to know about a magic knob that fixes a problem they don't even know they have is not an acceptable solution. The first step to solving such a problem is to provide a reproducable, measurable test case in a simple script that demonstrates the problem that needs solving. If we can reproduce it at will, then half the battle is already won.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html