Re: races in ll_splice_alias()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mar 8, 2016, at 7:34 PM, Al Viro wrote:

> On Tue, Mar 08, 2016 at 11:18:09PM +0000, Drokin, Oleg wrote:
>> Rename on server cannot get us to see the same directory in two places, or what's
>> the scenario?
>> In Lustre:
>> thread1: lookup a directory on the server.
>>         get a lock back
>>         instantiate a dentry (guarded by the lock)
>>         make a lock revocable (ll_lookup_finish_locks() in ll_lookup_it())
>> thread2: rename the directory moving it somewhere else
>>         server attempts to revoke the lock from thread1
>>         node that runs thread1 drops the lock and marks all dentries for that
>>              inode invalid
>>         server completes rename and releases the lock it holds
>> thread3: lookup the directory under new name on the server
>>         this can only return from server when the rename has completed and the
>>         dentry from thread1 marked invalid.
> 
> thread2 might run on another client; might or might not make any difference,
> but in any case server going nuts shouldn't corrupt data structures on client…

It does not matter indeed if thread2 is run on any particular client due
to the lustre locking.

>> Ok, let me try my hand at that. Hopefully whatever complications are there would
>> show themselves right away too.
>> 
>>> would always either inserted inode reference into a new dentry or dropped it.
>>> I'm still trying to trace what does iput() in case of error in your current
>>> code; I understand the one in do_statahead_enter(), but what does it in
>>> ll_lookup_it_finish()?
>> 
>> You mean when ll_d_init() fails in ll_splice_alias()?
>> Hm… It appears that we are indeed leaking the inode in that case, thanks.
>> I'll try to address that too.
>> Hm, in fact this was almost noticed, but I guess nobody retested it after
>> fixing the initial crash we had with 7486bc06ab2c46d6957f0211d09bc549aaf9cc87
> 
> If that's the case, I'd try this (on top of the patch from upthread):

I have not tried this exact one yet (will try in a moment).
But it appears we do need ll_d_init() call if we got a hit
in d_exact_alias().
A particular test case fails due to lack of the d_fsdata being null.

So far I hit it in case of unix sockets with a trace like this:

[  291.019261] Lustre: DEBUG MARKER: == sanity test 54a: unix domain socket test ========================================================== 19:31:19 (1457483479)
[  291.127120] LustreError: 4984:0:(llite_internal.h:1462:d_lustre_revalidate()) ASSERTION( ((struct ll_dentry_data *)((dentry)->d_fsdata)) ) failed: 
[  291.127900] LustreError: 4984:0:(llite_internal.h:1462:d_lustre_revalidate()) LBUG
[  291.128626] CPU: 5 PID: 4984 Comm: socketclient Tainted: G         C      4.5.0-rc6-vm-nfs+ #74
[  291.129340] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  291.129343]  0000000000000000 ffff8800d7a379a0 ffffffff81413801 ffffffffa0464200
[  291.129343]  ffff8800d3b05f50 ffff8800d7a379c0 ffffffffa016ca36 0000000000000000
[  291.129344]  ffff8800d3b05ed0 ffff8800d7a37a60 ffffffffa0438680 ffff8800c22d3000
[  291.129345] Call Trace:
[  291.129349]  [<ffffffff81413801>] dump_stack+0x63/0x82
[  291.129356]  [<ffffffffa016ca36>] lbug_with_loc+0x46/0xb0 [libcfs]
[  291.129368]  [<ffffffffa0438680>] ll_lookup_it_finish+0x610/0x970 [lustre]
[  291.129374]  [<ffffffffa042dace>] ? ll_finish_md_op_data+0xe/0x10 [lustre]
[  291.129380]  [<ffffffffa0438c6b>] ll_lookup_it+0x28b/0x680 [lustre]
[  291.129386]  [<ffffffffa04375a0>] ? ll_rmdir+0x390/0x390 [lustre]
[  291.129392]  [<ffffffffa0439a23>] ll_lookup_nd+0x73/0x120 [lustre]
[  291.129394]  [<ffffffff8122796d>] lookup_real+0x1d/0x60
[  291.129395]  [<ffffffff81227e23>] __lookup_hash+0x33/0x40
[  291.129396]  [<ffffffff8122a71c>] walk_component+0x1bc/0x510
[  291.129397]  [<ffffffff8122ad50>] ? link_path_walk+0x2e0/0x500
[  291.129398]  [<ffffffff8122bd7a>] ? path_init+0x3ca/0x5a0
[  291.129400]  [<ffffffff8122c02d>] path_lookupat+0x5d/0x110
[  291.129401]  [<ffffffff8122e07e>] filename_lookup+0x9e/0x150
[  291.129403]  [<ffffffff811f9591>] ? __kmalloc_node_track_caller+0x31/0x40
[  291.129404]  [<ffffffff811f9591>] ? __kmalloc_node_track_caller+0x31/0x40
[  291.129405]  [<ffffffff8122db24>] ? getname_kernel+0x34/0x120
[  291.129406]  [<ffffffff8122db7d>] ? getname_kernel+0x8d/0x120
[  291.129407]  [<ffffffff8122e15b>] kern_path+0x2b/0x30
[  291.129408]  [<ffffffff81741928>] unix_find_other+0x38/0x1f0
[  291.129409]  [<ffffffff81741bd8>] unix_stream_connect+0xf8/0x480
[  291.129411]  [<ffffffff8166b817>] SYSC_connect+0xb7/0xf0
[  291.129412]  [<ffffffff8166c3ee>] SyS_connect+0xe/0x10
[  291.129413]  [<ffffffff817d1b36>] entry_SYSCALL_64_fastpath+0x16/0x7a

Sadly it seems crash tool does not know how to work with modern kernels
yet again so it is time to update and see what was going on in more details.

The test itself is simple:
test_54a() {
        $SOCKETSERVER $DIR/socket ||
                error "$SOCKETSERVER $DIR/socket failed: $?"
        $SOCKETCLIENT $DIR/socket ||
                error "$SOCKETCLIENT $DIR/socket failed: $?"
        $MUNLINK $DIR/socket || error "$MUNLINK $DIR/socket failed: $?"
}
run_test 54a "unix domain socket test =========================="

with the socket server and socket client doing what you would think.

The patch I had at hand was:
--- a/drivers/staging/lustre/lustre/llite/namei.c
+++ b/drivers/staging/lustre/lustre/llite/namei.c
@@ -358,14 +358,8 @@ struct dentry *ll_splice_alias(struct inode *inode, struct dentry *de)
 	int rc;
 
 	if (inode) {
-		new = ll_find_alias(inode, de);
+		new = d_exact_alias(de, inode);
 		if (new) {
-			rc = ll_d_init(new);
-			if (rc < 0) {
-				dput(new);
-				return ERR_PTR(rc);
-			}
-			d_move(new, de);
 			iput(inode);
 			CDEBUG(D_DENTRY,
 			       "Reuse dentry %p inode %p refc %d flags %#x\n",
@@ -374,9 +368,9 @@ struct dentry *ll_splice_alias(struct inode *inode, struct dentry *de)
 		}
 	}
 	rc = ll_d_init(de);
+	d_add(de, inode);
 	if (rc < 0)
 		return ERR_PTR(rc);
-	d_add(de, inode);
 	CDEBUG(D_DENTRY, "Add dentry %p inode %p refc %d flags %#x\n",
 	       de, d_inode(de), d_count(de), de->d_flags);

Overall the rate of hits in d_exact_alias was very low, need to check if pre-patch it was any better.

Also any particular reason to prefer d_splice_alias over d_add? Did not we already determine there
were no other aliases.
> 
> diff --git a/drivers/staging/lustre/lustre/llite/namei.c b/drivers/staging/lustre/lustre/llite/namei.c
> index da5f443..bcc9841 100644
> --- a/drivers/staging/lustre/lustre/llite/namei.c
> +++ b/drivers/staging/lustre/lustre/llite/namei.c
> @@ -320,81 +320,37 @@ void ll_i2gids(__u32 *suppgids, struct inode *i1, struct inode *i2)
> }
> 
> /*
> - * try to reuse three types of dentry:
> - * 1. unhashed alias, this one is unhashed by d_invalidate (but it may be valid
> - *    by concurrent .revalidate).
> - * 2. INVALID alias (common case for no valid ldlm lock held, but this flag may
> - *    be cleared by others calling d_lustre_revalidate).
> - * 3. DISCONNECTED alias.
> - */
> -static struct dentry *ll_find_alias(struct inode *inode, struct dentry *dentry)
> -{
> -	struct dentry *alias, *discon_alias, *invalid_alias;
> -
> -	if (hlist_empty(&inode->i_dentry))
> -		return NULL;
> -
> -	discon_alias = invalid_alias = NULL;
> -
> -	ll_lock_dcache(inode);
> -	hlist_for_each_entry(alias, &inode->i_dentry, d_u.d_alias) {
> -		LASSERT(alias != dentry);
> -
> -		spin_lock(&alias->d_lock);
> -		if (alias->d_flags & DCACHE_DISCONNECTED)
> -			/* LASSERT(last_discon == NULL); LU-405, bz 20055 */
> -			discon_alias = alias;
> -		else if (alias->d_parent == dentry->d_parent	     &&
> -			 alias->d_name.hash == dentry->d_name.hash       &&
> -			 alias->d_name.len == dentry->d_name.len	 &&
> -			 memcmp(alias->d_name.name, dentry->d_name.name,
> -				dentry->d_name.len) == 0)
> -			invalid_alias = alias;
> -		spin_unlock(&alias->d_lock);
> -
> -		if (invalid_alias)
> -			break;
> -	}
> -	alias = invalid_alias ?: discon_alias ?: NULL;
> -	if (alias) {
> -		spin_lock(&alias->d_lock);
> -		dget_dlock(alias);
> -		spin_unlock(&alias->d_lock);
> -	}
> -	ll_unlock_dcache(inode);
> -
> -	return alias;
> -}
> -
> -/*
>  * Similar to d_splice_alias(), but lustre treats invalid alias
>  * similar to DCACHE_DISCONNECTED, and tries to use it anyway.
>  */
> struct dentry *ll_splice_alias(struct inode *inode, struct dentry *de)
> {
> -	struct dentry *new;
> +	struct dentry *alias = NULL;
> 	int rc;
> 
> 	if (inode) {
> -		new = ll_find_alias(inode, de);
> -		if (new) {
> -			rc = ll_d_init(new);
> -			if (rc < 0) {
> -				dput(new);
> -				return ERR_PTR(rc);
> -			}
> -			d_move(new, de);
> +		alias = d_exact_alias(de, inode);
> +		if (alias)
> +			iput(inode);
> +	}
> +
> +	if (!alias) {
> +		rc = ll_d_init(de);
> +		if (rc < 0) {
> 			iput(inode);
> -			CDEBUG(D_DENTRY,
> -			       "Reuse dentry %p inode %p refc %d flags %#x\n",
> -			      new, d_inode(new), d_count(new), new->d_flags);
> -			return new;
> +			return ERR_PTR(rc);
> 		}
> +		alias = d_splice_alias(inode, de);
> +		if (IS_ERR(alias))
> +			return alias;
> +	}
> +
> +	if (alias) {
> +		CDEBUG(D_DENTRY,
> +		       "Reuse dentry %p inode %p refc %d flags %#x\n",
> +		      alias, d_inode(alias), d_count(alias), alias->d_flags);
> +		return alias;
> 	}
> -	rc = ll_d_init(de);
> -	if (rc < 0)
> -		return ERR_PTR(rc);
> -	d_add(de, inode);
> 	CDEBUG(D_DENTRY, "Add dentry %p inode %p refc %d flags %#x\n",
> 	       de, d_inode(de), d_count(de), de->d_flags);
> 	return de;
> diff --git a/drivers/staging/lustre/lustre/llite/statahead.c b/drivers/staging/lustre/lustre/llite/statahead.c
> index 88ffd8e..6c64de0 100644
> --- a/drivers/staging/lustre/lustre/llite/statahead.c
> +++ b/drivers/staging/lustre/lustre/llite/statahead.c
> @@ -1576,6 +1576,7 @@ int do_statahead_enter(struct inode *dir, struct dentry **dentryp,
> 					alias = ll_splice_alias(inode,
> 								   *dentryp);
> 					if (IS_ERR(alias)) {
> +						entry->se_inode = NULL;
> 						ll_sai_unplug(sai, entry);
> 						return PTR_ERR(alias);
> 					}

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux