Sometimes generic/011 causes kafs to follow up an FS.RemoveDir RPC call by spending around a second sending a slew of FS.FetchStatus RPC calls to the directory just deleted that then abort with VNOVNODE, indicating deletion of the target directory. This seems to stem from userspace attempting to stat the directory or something in it: afs_select_fileserver+0x46d/0xaa2 afs_wait_for_operation+0x12/0x17e afs_fetch_status+0x56/0x75 afs_validate+0xfb/0x240 afs_permission+0xef/0x1b0 inode_permission+0x90/0x139 link_path_walk.part.0.constprop.0+0x6f/0x2f0 path_lookupat+0x4c/0xfa filename_lookup+0x63/0xd7 vfs_statx+0x62/0x13f vfs_fstatat+0x72/0x8a The issue appears to be that afs_dir_remove_subdir() marks the callback promise as being cancelled by setting the expiry time to AFS_NO_CB_PROMISE - which then confuses afs_validate() which sends the FetchStatus to try and get a new one before it checks for the AFS_VNODE_DELETED flag which indicates that we know the directory got deleted. Fix this by moving the AFS_VNODE_DELETED check up above the expiration check, and even above the locking. Signed-off-by: David Howells <dhowells@xxxxxxxxxx> cc: Marc Dionne <marc.dionne@xxxxxxxxxxxx> cc: linux-afs@xxxxxxxxxxxxxxxxxxx --- fs/afs/validation.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/fs/afs/validation.c b/fs/afs/validation.c index 46b37f2cce7d..850f5287107d 100644 --- a/fs/afs/validation.c +++ b/fs/afs/validation.c @@ -391,6 +391,11 @@ int afs_validate(struct afs_vnode *vnode, struct key *key) if (afs_check_validity(vnode)) return 0; + if (test_bit(AFS_VNODE_DELETED, &vnode->flags)) { + _debug("file already deleted"); + return -ESTALE; + } + ret = down_write_killable(&vnode->validate_lock); if (ret < 0) goto error; @@ -448,12 +453,6 @@ int afs_validate(struct afs_vnode *vnode, struct key *key) vnode->cb_ro_snapshot = cb_ro_snapshot; vnode->cb_scrub = cb_scrub; - if (test_bit(AFS_VNODE_DELETED, &vnode->flags)) { - _debug("file already deleted"); - ret = -ESTALE; - goto error_unlock; - } - /* if the vnode's data version number changed then its contents are * different */ zap |= test_and_clear_bit(AFS_VNODE_ZAP_DATA, &vnode->flags);