Re: [RFC][PATCH] rmdir(),rename(): do shrink_dcache_parent() only on success

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On May 27, 2018, at 4:20 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> 
> Once upon a time ->rmdir() instances used to check if victim inode
> had more than one (in-core) reference and failed with -EBUSY if it
> had.  The reason was race avoidance - emptiness check is worthless
> if somebody could just go and create new objects in the victim
> directory afterwards.
> 
> With introduction of dcache the checks had been replaced with
> checking the refcount of dentry.  However, since a cached negative
> lookup leaves a negative child dentry, such check had lead to false
> positives - with empty foo/ doing stat foo/bar before rmdir foo
> ended up with -EBUSY unless the negative dentry of foo/bar happened
> to be evicted by the time of rmdir(2).  That had been fixed by
> doing shrink_dcache_parent() just before the refcount check.
> 
> At the same time, ext2_rmdir() has grown a private solution that
> eliminated those -EBUSY - it did something (setting ->i_size to 0)
> which made any subsequent ext2_add_entry() fail.
> 
> Unfortunately, even with shrink_dcache_parent() the check had been
> racy - after all, the victim itself could be found by dcache lookup
> just after we'd checked its refcount.  That got fixed by a new
> helper (dentry_unhash()) that did shrink_dcache_parent() and unhashed
> the sucker if its refcount ended up equal to 1.  That got called before
> ->rmdir(), turning the checks in ->rmdir() instances into "if not
> unhashed fail with -EBUSY".  Which reduced the boilerplate nicely, but
> had an unpleasant side effect - now shrink_dcache_parent() had been
> done before the emptiness checks, leading to easily triggerable calls
> of shrink_dcache_parent() on arbitrary large subtrees, quite possibly
> nested into each other.
> 
> Several years later the ext2-private trick had been generalized -
> (in-core) inodes of dead directories are flagged and calls of
> lookup, readdir and all directory-modifying methods were prevented
> in so marked directories.  Remaining boilerplate in ->rmdir() instances
> became redundant and some instances got rid of it.
> 
> In 2011 the call of dentry_unhash() got shifted into ->rmdir() instances
> and then killed off in all of them.  That has lead to another problem,
> though - in case of successful rmdir we *want* any (negative) child
> dentries dropped and the victim itself made negative.  There's no point
> keeping cached negative lookups in foo when we can get the negative
> lookup of foo itself cached.  So shrink_dcache_parent() call had been
> restored; unfortunately, it went into the place where dentry_unhash()
> used to be, i.e. before the ->rmdir() call.  Note that we don't unhash
> anymore, so any "is it busy" checks would be racy; fortunately, all of
> them are gone.
> 
> We should've done that call right *after* successful ->rmdir().  That
> reduces contention caused by tree-walking in shrink_dcache_parent()
> and, especially, contention caused by evictions in two nested subtrees
> going on in parallel.  The same goes for directory-overwriting rename() -
> the story there had been parallel to that of rmdir().
> 
> Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
> ---
> diff --git a/fs/namei.c b/fs/namei.c
> index 186bd2464fd5..269b64a1121a 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3847,11 +3847,11 @@ int vfs_rmdir(struct inode *dir, struct dentry *dentry)
> 	if (error)
> 		goto out;
> 
> -	shrink_dcache_parent(dentry);
> 	error = dir->i_op->rmdir(dir, dentry);
> 	if (error)
> 		goto out;
> 
> +	shrink_dcache_parent(dentry);
> 	dentry->d_inode->i_flags |= S_DEAD;
> 	dont_mount(dentry);
> 	detach_mounts(dentry);
> @@ -4434,8 +4434,6 @@ int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
> 		    old_dir->i_nlink >= max_links)
> 			goto out;
> 	}
> -	if (is_dir && !(flags & RENAME_EXCHANGE) && target)
> -		shrink_dcache_parent(new_dentry);
> 	if (!is_dir) {
> 		error = try_break_deleg(source, delegated_inode);
> 		if (error)
> @@ -4452,8 +4450,10 @@ int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
> 		goto out;
> 
> 	if (!(flags & RENAME_EXCHANGE) && target) {
> -		if (is_dir)
> +		if (is_dir) {
> +			shrink_dcache_parent(new_dentry);
> 			target->i_flags |= S_DEAD;

Would it be better to set S_DEAD on the removed directory before
shrink_dcache_parent() is called (here and in vfs_rmdir()), or is
there no way for a new dentry to be added to the parent after the
shrink is done?

Cheers, Andreas





Attachment: signature.asc
Description: Message signed with OpenPGP


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux