Re: [PATCH RFC 00/12] Allow concurrent directory updates.

"NeilBrown" <neilb@xxxxxxx> · Thu, 16 Jun 2022 10:55:46 +1000

On Wed, 15 Jun 2022, Daire Byrne wrote:
...
> With the patch, the aggregate increases to 15 creates/s for 10 clients
> which again matches the results of a single patched client. Not quite
> a x10 increase but a healthy improvement nonetheless.

Great!

> 
> However, it is at this point that I started to experience some
> stability issues with the re-export server that are not present with
> the vanilla unpatched v5.19-rc2 kernel. In particular the knfsd
> threads start to lock up with stack traces like this:
> 
> [ 1234.460696] INFO: task nfsd:5514 blocked for more than 123 seconds.
> [ 1234.461481]       Tainted: G        W   E     5.19.0-1.dneg.x86_64 #1
> [ 1234.462289] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 1234.463227] task:nfsd            state:D stack:    0 pid: 5514
> ppid:     2 flags:0x00004000
> [ 1234.464212] Call Trace:
> [ 1234.464677]  <TASK>
> [ 1234.465104]  __schedule+0x2a9/0x8a0
> [ 1234.465663]  schedule+0x55/0xc0
> [ 1234.466183]  ? nfs_lookup_revalidate_dentry+0x3a0/0x3a0 [nfs]
> [ 1234.466995]  __nfs_lookup_revalidate+0xdf/0x120 [nfs]

I can see the cause of this - I forget a wakeup.  This patch should fix
it, though I hope to find a better solution.

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 54c2c7adcd56..072130d000c4 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -2483,17 +2483,16 @@ int nfs_unlink(struct inode *dir, struct dentry *dentry)
 	if (!(dentry->d_flags & DCACHE_PAR_UPDATE)) {
 		/* Must have exclusive lock on parent */
 		did_set_par_update = true;
+		lock_acquire_exclusive(&dentry->d_update_map, 0,
+				       0, NULL, _THIS_IP_);
 		dentry->d_flags |= DCACHE_PAR_UPDATE;
 	}
 
 	spin_unlock(&dentry->d_lock);
 	error = nfs_safe_remove(dentry);
 	nfs_dentry_remove_handle_error(dir, dentry, error);
-	if (did_set_par_update) {
-		spin_lock(&dentry->d_lock);
-		dentry->d_flags &= ~DCACHE_PAR_UPDATE;
-		spin_unlock(&dentry->d_lock);
-	}
+	if (did_set_par_update)
+		d_unlock_update(dentry);
 out:
 	trace_nfs_unlink_exit(dir, dentry, error);
 	return error;

> 
> So all in all, the performance improvements in the knfsd re-export
> case is looking great and we have real world use cases that this helps
> with (batch processing workloads with latencies >10ms). If we can
> figure out the hanging knfsd threads, then I can test it more heavily.

Hopefully the above patch will allow the more heavy testing to continue.
In any case, thanks a lot for the testing so far,

NeilBrown