On Tue, May 28, 2013 at 2:12 AM, Keyur Govande <keyurgovande@xxxxxxxxx> wrote: > On Sun, May 26, 2013 at 7:23 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: >> On Fri, May 24, 2013 at 11:12:50PM -0400, Keyur Govande wrote: >>> On Mon, May 20, 2013 at 6:53 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: >>> > On Sun, May 19, 2013 at 11:50:55PM -0400, Keyur Govande wrote: >>> >> Hello, >>> >> >>> >> We have a bunch of servers that create a lot of temp files, or check >>> >> for the existence of non-existent files. Every such operation creates >>> >> a dentry object and soon most of the free memory is consumed for >>> >> 'negative' dentry entries. This behavior was observed on both CentOS >>> >> kernel v.2.6.32-358 and Amazon Linux kernel v.3.4.43-4. >>> >> >>> >> There are also some processes running that occasionally allocate large >>> >> chunks of memory, and when this happens the kernel clears out a bunch >>> >> of stale dentry caches. This clearing takes some time. kswapd kicks >>> >> in, and allocations and bzero() of 4GB that normally takes <1s, takes >>> >> 20s or more. >>> >> >>> >> Because the memory needs are non-continuous but negative dentry >>> >> generation is fairly continuous, vfs_cache_pressure doesn't help much. >>> >> >>> >> The thought I had was to have a sysctl that limits the number of >>> >> dentries per super-block (sb-max-dentry). Everytime a new dentry is >>> >> allocated in d_alloc(), check if dentry_stat.nr_dentry exceeds (number >>> >> of super blocks * sb-max-dentry). If yes, queue up an asynchronous >>> >> workqueue call to prune_dcache(). Also have a separate sysctl to >>> >> indicate by what percentage to reduce the dentry entries when this >>> >> happens. >>> > >>> > This request does come up every so often. There are valid reasons >>> > for being able to control the exact size of the dentry and page >>> > caches - I've seen a few implementations in storage appliance >>> > vendor kernels where total control of memory usage yields a few >>> > percent better performance of industry specific benchmarks. Indeed, >>> > years ago I thought that capping the size of the dnetry cache was a >>> > good idea, too. >>> > >>> > However, the problem that I've seen with every single on of these >>> > implementations is that the limit is carefully tuned for best all >>> > round performance in a given set of canned workloads. When the limit >>> > is wrong, performance tanks, and it is just about impossible to set >>> > a limit correctly for a machine that has a changing workload. >>> > >>> > If your problem is negative dentries building up, where do you set >>> > the limit? Set it low enough to keep only a small number of total >>> > dentries to keep the negative dentries down, and you'll end up >>> > with a dentry cache that isn't big enough to hold all th dentries >>> > needed for efficient performance with workloads that do directory >>> > traversals. It's a two-edged sword, and most people do not have >>> > enough knowledge to tune a knob correctly. >>> > >>> > IOWs, the automatic sizing of the dentry cache based on memory >>> > pressure is the correct thing to do. Capping it, or allowing it to >>> > be capped will simply generate bug reports for strange performance >>> > problems.... >>> > >>> > That said, keeping lots of negative dentries around until memory >>> > pressure kicks them out is probably the wrong thing to do. Negative >>> > dentries are an optimisation for some workloads, but they tend to >>> > have references to negative dentries with a temporal locality that >>> > matches the unlink time. >>> > >>> > Perhaps we need to separately reclaim negative dentries i.e. not >>> > wait for memory pressure to reclaim them but use some other kind of >>> > trigger for reclamation. That doesn't cap the size of the dentry >>> > cache, but would address the problem of negative dentry buildup.... >>> > >>> > Cheers, >>> > >>> > Dave. >>> > -- >>> > Dave Chinner >>> > david@xxxxxxxxxxxxx >>> >>> Hi Dave, >>> >>> Thank you for responding. Sorry it took so long for me to get back, >>> been a bit busy. >>> >>> I do agree that having a knob, and then setting a bad value can tank >>> performance. But not having a knob IMO is worse. Currently there are >>> no options for controlling the cache, bar dropping the caches >>> altogether every so often. The knob would have a default value of >>> ((unsigned long) -1)), so if one does not care for it, they would >>> experience the same behavior as today. >> >> And therein lies the problem with a knob. What's the point of having >> a knob that nobody but a handful of people know what it does or >> evenhow to recognise when they need to tweak it. It's long been a >> linux kernel policy that the kernel should do the right thing by >> default. As such, knobs to tweak things are a last resort. >> >>> Also, setting a bad value for the knob would negatively impact file-IO >>> performance, which on a spinning disk isn't guaranteed anyway. The >>> current situation tanks memory performance which is more unexpected to >>> a normal user. >> >> Which is precisely why a knob is the wrong solution. If it's >> something a normal, unsuspecting user has problems with, then it >> needs to be handled automatically by the kernel. Expecting users who >> don't even know what a dentry is to know about a magic knob that >> fixes a problem they don't even know they have is not an acceptable >> solution. >> >> The first step to solving such a problem is to provide a >> reproducable, measurable test case in a simple script that >> demonstrates the problem that needs solving. If we can reproduce it >> at will, then half the battle is already won.... >> > > Here's a simple test case: https://gist.github.com/keyurdg/5660719 to > create a ton of dentry cache entries, and > https://gist.github.com/keyurdg/5660723 to allocate some memory. > > I kicked off 3 instances of fopen in 3 different prefixed directories. > After all the memory was filled up with dentry entries, I tried > allocating 4GB of memory. It took ~20s. If I turned off the dentry > generation programs and attempted to allocate 4GB again, it only took > 2s (because the memory was already free). Here's a quick graph of this > behavior: http://i.imgur.com/XhgX84d.png > > I understand that in general, the kernel should do "the right thing" > without user input. But this seems to be a case where the user should > be allowed input into how memory is used. After all, there are already > lots of knobs in Linux that if set wrongly can cause user pain/bad > performance. IMO this new knob needs the right kind of documentation, > like suggesting the use of slabtop and perf to identify dentry as an > issue before setting the knob. > > I'm also not tied to the idea of the knob being a limit on the number > of dentry cache entries. A limit just seems easiest to administer; but > if there are other ways of alleviating this issue, then I'd love to > explore those as well. > >> Cheers, >> >> Dave. >> -- >> Dave Chinner >> david@xxxxxxxxxxxxx Forgot to add: the only "knob" for this issue ATM is to drop the entire cache altogether, a massive overreaction to the problem. The dentry cache system already has all its elements in an LRU; if we did allow setting a limit, any dropped dentries have a good chance of not being very significant (performance-wise). -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html