Hey Dave, On Mon, 30 May 2011, Dave Chinner wrote: > On Mon, May 30, 2011 at 12:06:04PM +1000, Dave Chinner wrote: > > Folks, > > > > I just booted up a 3.0-rc1 kernel, and mounted an XFS filesystem > > with 50M files in it. Running: > > > > $ for i in /mnt/scratch/*; do sudo /usr/bin/time rm -rf $i 2>&1 & done > > > > runs an 8-way parallel unlink on the files. Normally this runs at > > around 80k unlinks/s, and it runs with about 500k-1m dentries and > > inodes cached in the steady state. > > > > The steady state behaviour with 3.0-rc1 is that there are around 10m > > cached dentries - all negative dentries - consuming about 1.6GB of > > RAM (of 4GB total). Previous steady state was, IIRC, around 200MB of > > dentries. My initial suspicions are that the dentry unhashing > > changeÿÿ may be the cause of this... > > So a bisect lands on: > > $ git bisect good > 79bf7c732b5ff75b96022ed9d29181afd3d2509c is the first bad commit > commit 79bf7c732b5ff75b96022ed9d29181afd3d2509c > Author: Sage Weil <sage@xxxxxxxxxxxx> > Date: Tue May 24 13:06:06 2011 -0700 > > vfs: push dentry_unhash on rmdir into file systems > > Only a few file systems need this. Start by pushing it down into each > fs rmdir method (except gfs2 and xfs) so it can be dealt with on a per-fs > basis. > > This does not change behavior for any in-tree file systems. > > Acked-by: Christoph Hellwig <hch@xxxxxx> > Signed-off-by: Sage Weil <sage@xxxxxxxxxxxx> > Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx> > > :040000 040000 c45d58718d33f7ca1da87f99fa538f65eaa3fe2c ec71cbecc59e8b142a7bfcabd469fa67486bef30 M fs > > Ok, so the question has to be asked - why wasn't dentry_unhash() > pushed down into XFS? Christoph asked me to leave it out to avoid the push-down + remove noise. I missed it in v1, added it in v2, then took it out again. Ultimately that isn't the real problem, though: > Further, now that dentry_unhash() has been removed from most > filesystems, what is replacing the shrink_dcache_parent() call that > was cleaning up the "we can never reference again" child dentries of > the unlinked directories? It appears that they are now being left in > memory on the dentry LRU. It also appears that they have > D_REFERENCED bit set, so they do not get immediately reclaimed by > the shrinker. Ah, yeah, that makes sense. I missed the shrink_dcache_parent side effect. I suspect we just need something like the below? (Very lightly tested!) Thanks- sage From c1fac19b662b02ab4aea98ee2a8d0098bc985bc8 Mon Sep 17 00:00:00 2001 From: Sage Weil <sage@xxxxxxxxxxxx> Date: Sun, 29 May 2011 20:35:44 -0700 Subject: [PATCH 1/3] vfs: shrink_dcache_parent before rmdir, dir rename The dentry_unhash push-down series missed that shink_dcache_parent needs to be called prior to rmdir or dir rename to clear DCACHE_REFERENCED and allow efficient dentry reclaim. Reported-by: Dave Chinner <david@xxxxxxxxxxxxx> Signed-off-by: Sage Weil <sage@xxxxxxxxxxxx> --- fs/namei.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index 1ab641f..e2e4e8d 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -2579,6 +2579,7 @@ int vfs_rmdir(struct inode *dir, struct dentry *dentry) if (error) goto out; + shrink_dcache_parent(dentry); error = dir->i_op->rmdir(dir, dentry); if (error) goto out; @@ -2993,6 +2994,8 @@ static int vfs_rename_dir(struct inode *old_dir, struct dentry *old_dentry, if (d_mountpoint(old_dentry) || d_mountpoint(new_dentry)) goto out; + if (target) + shrink_dcache_parent(new_dentry); error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry); if (error) goto out; -- 1.7.1