James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> writes: > On Tue, 2022-03-22 at 14:08 -0700, Stephen Brennan wrote: [snip] >> If we're looking at issues like [1], then the amount needs to be on a >> per-directory basis, and maybe roughly based on CPU speed. For other >> priorities or failure modes, then the policy would need to be >> completely different. Ideally a solution could work for almost all >> scenarios, but failing that, maybe it is worth allowing policy to be >> set by administrators via sysctl or even a BPF? > > Looking at [1], you're really trying to contain the parent's child list > from exploding with negative dentries. Looking through the patch, it > still strikes me that dentry_kill/retain_dentry is still a better > place, because if a negative dentry comes back there, it's unlikely to > become positive (well, fstat followed by create would be the counter > example, but it would partly be the app's fault for not doing > open(O_CREAT)). I actually like the idea of doing the pruning during d_alloc(). Basically, if you're creating dentries, you should also be working on the cache management for them. > If we have the signal for reuse of negative dentry from the cache, > which would be a fast lookup, we know a newly created negative dentry > already had a slow lookup, so we can do more processing without > necessarily slowing down the workload too much. In particular, we > could just iterate over the parent's children of this negative dentry > and start pruning if there are too many (too many being a relative > term, but I think something like 2x-10x the max positive entries > wouldn't be such a bad heuristic). I agree that, on a per-directory basis, 2-10x feels right, though maybe there needs to be some leeway for empty directories? Per-directory pruning also makes sense from a concurrency standpoint: the LRU locks could become a source of contention. > Assuming we don't allow the > parent's list to contain too many negative dentries, we might not need > the sweep negative logic because the list wouldn't be allowed to grow > overly large. Seconded, I have no desire to actually try to get that sweep negative logic merged if we can do a better job handling the dentries in the first place. > I think a second heuristic would be prune at least two > negative dentries from the end of the sb lru list if they've never been > used for a lookup and were created more than a specified time ago > (problem is what, but I bet a minute wouldn't be unreasonable). > > Obviously, while I think it would work for some of the negative dentry > induced issues, the above is very heuristic in tuning and won't help > with any of the other object issues in filesystems. But on the other > hand, negative dentries are special in that if they're never used to > cache an -ENOENT and they never go positive, they're just a waste of > space. I took a preliminary stab at some of these ideas in this series: https://lore.kernel.org/linux-fsdevel/20220331190827.48241-1-stephen.s.brennan@xxxxxxxxxx/ It doesn't handle pruning from the sb LRU list, nor does it have a good way to decide which dentry to kill. But it's pretty stable and simple, and I value that part :) . It still needs some work but I'd welcome feedback from folks interested in this discussion. Stephen > > James