On Feb 26, 2020, at 9:29 AM, Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > On Wed, Feb 26, 2020 at 11:13:53AM -0500, Waiman Long wrote: >> A new sysctl parameter "dentry-dir-max" is introduced which accepts a >> value of 0 (default) for no limit or a positive integer 256 and up. Small >> dentry-dir-max numbers are forbidden to avoid excessive dentry count >> checking which can impact system performance. > > This is always the wrong approach. A sysctl is just a way of blaming > the sysadmin for us not being very good at programming. > > I agree that we need a way to limit the number of negative dentries. > But that limit needs to be dynamic and depend on how the system is being > used, not on how some overworked sysadmin has configured it. > > So we need an initial estimate for the number of negative dentries that > we need for good performance. Maybe it's 1000. It doesn't really matter; > it's going to change dynamically. > > Then we need a metric to let us know whether it needs to be increased. > Perhaps that's "number of new negative dentries created in the last > second". And we need to decide how much to increase it; maybe it's by > 50% or maybe by 10%. Perhaps somewhere between 10-100% depending on > how high the recent rate of negative dentry creation has been. > > We also need a metric to let us know whether it needs to be decreased. > I'm reluctant to say that memory pressure should be that metric because > very large systems can let the number of dentries grow in an unbounded > way. Perhaps that metric is "number of hits in the negative dentry > cache in the last ten seconds". Again, we'll need to decide how much > to shrink the target number by. OK, so now instead of a single tunable parameter we need three, because these numbers are totally made up and nobody knows the right values. :-) Defaulting the limit to "disabled/no limit" also has the problem that 99.99% of users won't even know this tunable exists, let alone how to set it correctly, so they will continue to see these problems, and the code may as well not exist (i.e. pure overhead), while Waiman has a better idea today of what would be reasonable defaults. I definitely agree that a single fixed value will be wrong for every system except the original developer's. Making the maximum default to some reasonable fraction of the system size, rather than a fixed value, is probably best to start. Something like this as a starting point: /* Allow a reasonable minimum number of negative entries, * but proportionately more if the directory/dcache is large. */ dir_negative_max = max(num_dir_entries / 16, 1024); total_negative_max = max(totalram_pages / 32, total_dentries / 8); (Waiman should decide actual values based on where the problem was hit previously), and include tunables to change the limits for testing. Ideally there would also be a dir ioctl that allows fetching the current positive/negative entry count on a directory (e.g. /usr/bin, /usr/lib64, /usr/share/man/man*) to see what these values are. Otherwise there is no way to determine whether the limits used are any good or not. Dynamic limits are hard to get right, and incorrect state machines can lead to wild swings in behaviour due to unexpected feedback. It isn't clear to me that adjusting the limit based on the current rate of negative dentry creation even makes sense. If there are a lot of negative entries being created, that is when you'd want to _stop_ allowing more to be added. We don't have any limit today, so imposing some large-but-still-reasonable upper limit on negative entries will catch the runaway negative dcache case that was the original need of this functionality without adding a lot of complexity that we may not need at all. > If the number of negative dentries is at or above the target, then > creating a new negative dentry means evicting an existing negative dentry. > If the number of negative dentries is lower than the target, then we > can just create a new one. > > Of course, memory pressure (and shrinking the target number) should > cause negative dentries to be evicted from the old end of the LRU list. > But memory pressure shouldn't cause us to change the target number; > the target number is what we think we need to keep the system running > smoothly. Cheers, Andreas
Attachment:
signature.asc
Description: Message signed with OpenPGP