Re: nfsd oops on Linus' current tree.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2013-01-03 at 17:26 -0500, Tejun Heo wrote:
+AD4- Hello, Trond.
+AD4- 
+AD4- On Thu, Jan 03, 2013 at 10:12:32PM +-0000, Myklebust, Trond wrote:
+AD4- +AD4- +AD4- The analysis is likely completely wrong, so please don't go off doing
+AD4- +AD4- +AD4- something unnecessary.  Please take look at what's causing the
+AD4- +AD4- +AD4- deadlocks again.
+AD4- +AD4- 
+AD4- +AD4- The analysis is a no-brainer:
+AD4- +AD4- We see a deadlock due to one work item waiting for completion of another
+AD4- +AD4- work item that is queued on the same CPU. There is no other dependency
+AD4- +AD4- between the two work items.
+AD4- 
+AD4- What do you mean +ACI-waiting for completion of+ACI-?  Is one flushing the
+AD4- other?  Or is one waiting for the other to take certain action?  How
+AD4- are the two work items related?  Are they queued on the same
+AD4- workqueue?  Can you create a minimal repro case of the observed
+AD4- deadlock?

The two work items are running on different work queues. One is running
on the nfsiod work queue, and is waiting for the other to complete on
the rpciod work queue. Basically, the nfsiod work item is trying to shut
down an RPC session, and it is waiting for each outstanding RPC call to
finish running a clean-up routine.

We can't call flush+AF8-work(), since we don't have a way to pin the
work+AF8-struct for any long period of time, so we queue all the RPC calls
up, then sleep on a global wait queue for 1 second or until the last RPC
call wakes us up (see rpc+AF8-shutdown+AF8-client()).

In the deadlock scenario, it looks as if one (or more) of the RPC calls
are getting queued on the same CPU (but on the rpciod workqueue) as the
shutdown process (running on nfsiod).


+AD4- Ooh, BTW, there was a bug where workqueue code created a false
+AD4- dependency between two work items.  Workqueue currently considers two
+AD4- work items to be the same if they're on the same address and won't
+AD4- execute them concurrently - ie. it makes a work item which is queued
+AD4- again while being executed wait for the previous execution to
+AD4- complete.
+AD4- 
+AD4- If a work function frees the work item, and then waits for an event
+AD4- which should be performed by another work item and +ACo-that+ACo- work item
+AD4- recycles the freed work item, it can create a false dependency loop.
+AD4- There really is no reliable way to detect this short of verifying
+AD4- every memory free.  A patch is queued to make such occurrences less
+AD4- likely (work functions should also match for two work items considered
+AD4- the same), but if you're seeing this, the best thing to do is freeing
+AD4- the work item at the end of the work function.

That's interesting... I wonder if we may have been hitting that issue.

>From what I can see, we do actually free the write RPC task (and hence
the work+AF8-struct) before we call the asynchronous unlink completion...

Dros, can you see if reverting commit
324d003b0cd82151adbaecefef57b73f7959a469 +- commit 
168e4b39d1afb79a7e3ea6c3bb246b4c82c6bdb9 and then applying the attached
patch also fixes the hang on a pristine 3.7.x kernel?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust+AEA-netapp.com
www.netapp.com
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index b6bdb18..400f7ec 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -91,12 +91,13 @@ void nfs_readdata_release(struct nfs_read_data *rdata)
 	put_nfs_open_context(rdata->args.context);
 	if (rdata->pages.pagevec != rdata->pages.page_array)
 		kfree(rdata->pages.pagevec);
-	if (rdata != &read_header->rpc_data)
-		kfree(rdata);
-	else
+	if (rdata == &read_header->rpc_data) {
 		rdata->header = NULL;
+		rdata = NULL;
+	}
 	if (atomic_dec_and_test(&hdr->refcnt))
 		hdr->completion_ops->completion(hdr);
+	kfree(rdata);
 }
 EXPORT_SYMBOL_GPL(nfs_readdata_release);
 
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index b673be3..45d9250 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -126,12 +126,13 @@ void nfs_writedata_release(struct nfs_write_data *wdata)
 	put_nfs_open_context(wdata->args.context);
 	if (wdata->pages.pagevec != wdata->pages.page_array)
 		kfree(wdata->pages.pagevec);
-	if (wdata != &write_header->rpc_data)
-		kfree(wdata);
-	else
+	if (wdata == &write_header->rpc_data) {
 		wdata->header = NULL;
+		wdata = NULL;
+	}
 	if (atomic_dec_and_test(&hdr->refcnt))
 		hdr->completion_ops->completion(hdr);
+	kfree(wdata);
 }
 EXPORT_SYMBOL_GPL(nfs_writedata_release);
 

[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux