I'm currently working on removing the "dentry" vote from ocfs2. This is a network message broadcasted to all mounted nodes on every unlink() or rename(). Upon recieving the message, those nodes d_delete() the dentry in question. What I'm doing is replacing the broadcast mechanism with a cluster lock which covers a set of dentries. Every node gets a read only lock, until an unlink during which the unlinking node, will request an exclusive lock, forcing the other nodes who care about that dentry to d_delete() it. The effect is that we retain a very lightweight ->d_revalidate(), and at the same time get to make large improvements to the average case performance of the ocfs2 unlink and rename operations. I have uncovered a very small race with rename and d_move() however. The d_move() for a rename is after the ->rename() callback, outside the parent directory cluster locks which normally protect the name creation/removal mechanism. The worry is that after ocfs2_rename(), but before the d_move() a different node can discover the new name (created during the rename) and unlink it, forcing a d_delete(). d_move() it seems, unconditionally rehashes the (renamed) dentry and so if it's done after a d_delete() we could get some inconsitent state amongst the nodes. My solution is to simply allow ocfs2 to do the d_move() inside of ocfs2_rename() by introducing a FS_RENAME_DOES_D_MOVE flag. OCFS2 won't actually change how or why d_move() is called during a rename, it just uses this flag to change where the call is made. For any interested parties, a preliminary ocfs2 patch is at http://oss.oracle.com/~mfasheh/vote_removal/remove_dentry_vote_wip.patch The interesting stuff is mostly in fs/ocfs2/dcache.[ch] The ocfs2 patch isn't posted via e-mail because it's still a work in progress. That said, it actually works quite well, and the only things I have left to do are unrelated to rename - the patch needs to be split up into smaller ones, and a pair of (minor) dlm related fixups are needed. --Mark From: Mark Fasheh <mark.fasheh@xxxxxxxxxx> [PATCH] Allow file systems to manually d_move() inside of ->rename() Some file systems want to be able to manually d_move() the dentries involved in a rename. Introduce a flag which instructs vfs_rename_dir() and vfs_rename_other() that it has already been handled internally. OCFS2 uses this to protect that part of the entire operation with a cluster lock. Signed-off-by: Mark Fasheh <mark.fasheh@xxxxxxxxxx> diff --git a/fs/namei.c b/fs/namei.c index c784e8b..e5a8478 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -2353,7 +2353,8 @@ static int vfs_rename_dir(struct inode * dput(new_dentry); } if (!error) - d_move(old_dentry,new_dentry); + if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE)) + d_move(old_dentry,new_dentry); return error; } @@ -2377,7 +2378,7 @@ static int vfs_rename_other(struct inode error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry); if (!error) { /* The following d_move() should become unconditional */ - if (!(old_dir->i_sb->s_type->fs_flags & FS_ODD_RENAME)) + if (!(old_dir->i_sb->s_type->fs_flags & (FS_ODD_RENAME|FS_RENAME_DOES_D_MOVE))) d_move(old_dentry, new_dentry); } if (target) diff --git a/include/linux/fs.h b/include/linux/fs.h index e04a5cf..8e9a7ca 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -87,6 +87,7 @@ #define SEL_EX 4 /* public flags for file_system_type */ #define FS_REQUIRES_DEV 1 #define FS_BINARY_MOUNTDATA 2 +#define FS_RENAME_DOES_D_MOVE 4 #define FS_REVAL_DOT 16384 /* Check the paths ".", ".." for staleness */ #define FS_ODD_RENAME 32768 /* Temporary stuff; will go away as soon * as nfs_rename() will be cleaned up - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html