When doing stress test for module insertion and removal, the following phenomenon was found: $ lsmod Module Size Used by libkmod: kmod_module_get_holders: could not open \ '/sys/module/nbd/holders': No such file or directory nbd -2 -2 $ cat /proc/modules nbd 110592 0 - Live 0xffffffffc0298000 $ ls -1 /sys/module |grep nbd ls: cannot access 'nbd': No such file or directory nbd It seems the kernfs node of module has been activated and is returned to ls command through kernfs_fop_readdir(), but the sysfs dentry is negative. Further investigation found that there is race between kernfs dir creation and dentry lookup as shown below: CPU 0 CPU 1 kernfs_add_one down_write(&kernfs_rwsem) // insert nbd into rbtree // update the parent's revision kernfs_link_sibling() up_write(&kernfs_rwsem) kernfs_iop_lookup down_read(&kernfs_rwsem) // find nbd in rbtree, but it is deactivated kn = kernfs_find_ns() // return false kernfs_active() // a negative is created d_splice_alias(NULL, dentry) up_read(&kernfs_rwsem) // activate after negative dentry is created kernfs_activate() // return 0 because parent's // revision is stable now kernfs_dop_revalidate() The race will create a negative dentry for a kernfs node which is newly-added and activated. To fix it, there are two cases to be handled: (1) kernfs root without KERNFS_ROOT_CREATE_DEACTIVATED kernfs_rwsem can be always hold during kernfs_link_sibling() and kernfs_activate() in kernfs_add_one(), so kernfs_iop_lookup() will find an active kernfs node. (2) kernfs root with KERNFS_ROOT_CREATE_DEACTIVATED kernfs_activate() is called separatedly, and we can invalidate the dentry subtree with kn as root by increasing the revision of its parent. But we can invalidate in a finer granularity by only invalidating the negative dentry of the newly-activated kn node. So factor out a helper kernfs_activate_locked() to activate kernfs subtree lockless and invalidate the negative dentries if requested. Creation under kernfs root with CREATED_DEACTIVATED doesn't need invalidation because kernfs_rwsem is always hold, and kernfs root w/o CREATED_DEACTIVATED needs to invalidate the maybe-created negative dentries. kernfs_inc_rev() in kernfs_link_sibling() is kept because kernfs_rename_ns() needs it to invalidate the negative dentry of the target kernfs which is newly created by rename. Fixes: c7e7c04274b1 ("kernfs: use VFS negative dentry caching") Signed-off-by: Hou Tao <houtao1@xxxxxxxxxx> --- fs/kernfs/dir.c | 52 +++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 42 insertions(+), 10 deletions(-) diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c index ba581429bf7b..2f1ab8bad575 100644 --- a/fs/kernfs/dir.c +++ b/fs/kernfs/dir.c @@ -17,6 +17,8 @@ #include "kernfs-internal.h" +static void kernfs_activate_locked(struct kernfs_node *kn, bool invalidate); + DECLARE_RWSEM(kernfs_rwsem); static DEFINE_SPINLOCK(kernfs_rename_lock); /* kn->parent and ->name */ static char kernfs_pr_cont_buf[PATH_MAX]; /* protected by rename_lock */ @@ -753,8 +755,6 @@ int kernfs_add_one(struct kernfs_node *kn) ps_iattr->ia_mtime = ps_iattr->ia_ctime; } - up_write(&kernfs_rwsem); - /* * Activate the new node unless CREATE_DEACTIVATED is requested. * If not activated here, the kernfs user is responsible for @@ -763,8 +763,7 @@ int kernfs_add_one(struct kernfs_node *kn) * trigger deactivation. */ if (!(kernfs_root(kn)->flags & KERNFS_ROOT_CREATE_DEACTIVATED)) - kernfs_activate(kn); - return 0; + kernfs_activate_locked(kn, false); out_unlock: up_write(&kernfs_rwsem); @@ -942,8 +941,11 @@ struct kernfs_root *kernfs_create_root(struct kernfs_syscall_ops *scops, root->kn = kn; init_waitqueue_head(&root->deactivate_waitq); - if (!(root->flags & KERNFS_ROOT_CREATE_DEACTIVATED)) - kernfs_activate(kn); + if (!(root->flags & KERNFS_ROOT_CREATE_DEACTIVATED)) { + down_write(&kernfs_rwsem); + kernfs_activate_locked(kn, false); + up_write(&kernfs_rwsem); + } return root; } @@ -1262,8 +1264,11 @@ static struct kernfs_node *kernfs_next_descendant_post(struct kernfs_node *pos, } /** - * kernfs_activate - activate a node which started deactivated + * kernfs_activate_locked - activate a node which started deactivated * @kn: kernfs_node whose subtree is to be activated + * @invalidate: whether or not to increase the revision of parent node + * for each newly-activated child node. The increase will + * invalidate negative dentries created under the parent node. * * If the root has KERNFS_ROOT_CREATE_DEACTIVATED set, a newly created node * needs to be explicitly activated. A node which hasn't been activated @@ -1271,15 +1276,15 @@ static struct kernfs_node *kernfs_next_descendant_post(struct kernfs_node *pos, * removal. This is useful to construct atomic init sequences where * creation of multiple nodes should either succeed or fail atomically. * + * The caller must have acquired kernfs_rwsem. + * * The caller is responsible for ensuring that this function is not called * after kernfs_remove*() is invoked on @kn. */ -void kernfs_activate(struct kernfs_node *kn) +static void kernfs_activate_locked(struct kernfs_node *kn, bool invalidate) { struct kernfs_node *pos; - down_write(&kernfs_rwsem); - pos = NULL; while ((pos = kernfs_next_descendant_post(pos, kn))) { if (pos->flags & KERNFS_ACTIVATED) @@ -1290,8 +1295,35 @@ void kernfs_activate(struct kernfs_node *kn) atomic_sub(KN_DEACTIVATED_BIAS, &pos->active); pos->flags |= KERNFS_ACTIVATED; + + /* + * Invalidate the negative dentry created after pos is + * inserted into sibling rbtree but before it is + * activated. + */ + if (invalidate && pos->parent) + kernfs_inc_rev(pos->parent); } +} +/** + * kernfs_activate - activate a node which started deactivated + * @kn: kernfs_node whose subtree is to be activated + * + * Currently it is only used by kernfs root which has + * FS_ROOT_CREATE_DEACTIVATED set. Because the addition and the activation + * of children nodes are not atomic (not always hold kernfs_rwsem), + * negative dentry may be created for one child node after its addition + * but before its activation, so passing invalidate as true to + * @kernfs_activate_locked() to invalidate these negative dentries. + * + * The caller is responsible for ensuring that this function is not called + * after kernfs_remove*() is invoked on @kn. + */ +void kernfs_activate(struct kernfs_node *kn) +{ + down_write(&kernfs_rwsem); + kernfs_activate_locked(kn, true); up_write(&kernfs_rwsem); } -- 2.29.2