On 06/28/2010 09:50 PM, Benny Halevy wrote: > On Jun. 28, 2010, 20:33 +0300, Boaz Harrosh <bharrosh@xxxxxxxxxxx> wrote: >> >> If a callback is retried at nfsd4_cb_recall_done() do to >> some error. The returned rpc reply would then crash here: >> >> @@ -514,6 +514,7 @@ decode_cb_sequence(struct xdr_stream *xdr, struct nfsd4_cb_sequence *res, >> u32 dummy; >> __be32 *p; >> >> + BUG_ON(!res); >> if (res->cbs_minorversion == 0) >> return 0; >> >> [BUG_ON added for demonstration] >> >> This is because the nfsd4_cb_done_sequence() has NULLed out >> the task->tk_msg.rpc_resp pointer. >> >> This problem was introduced by a 4.1 protocol addition patch: >> [0421b5c5] nfsd41: Backchannel: Implement cb_recall over NFSv4.1 >> >> Which was overlooking the possibility of an RPC callback retries. >> >> Signed-off-by: Boaz Harrosh <bharrosh@xxxxxxxxxxx> >> --- >> fs/nfsd/nfs4callback.c | 3 --- >> 1 files changed, 0 insertions(+), 3 deletions(-) >> >> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c >> index f3b5015..dace7e2 100644 >> --- a/fs/nfsd/nfs4callback.c >> +++ b/fs/nfsd/nfs4callback.c >> @@ -869,9 +869,6 @@ static void nfsd4_cb_done_sequence(struct rpc_task *task, >> rpc_wake_up_next(&clp->cl_cb_waitq); >> dprintk("%s: freed slot, new seqid=%d\n", __func__, >> clp->cl_cb_seq_nr); >> - >> - /* We're done looking into the sequence information */ >> - task->tk_msg.rpc_resp = NULL; >> } >> } >> > > It looks like we have a more fundamental problem that > nfsd41_cb_setup_sequence is not called on the retry path > meaning that not only the message isn't reinitialized properly > but the single slot is not allocated as it should. That's true clp->cl_cb_slot_busy is not set on the retry path and there for there might be a bug with a second in-coming cb_layout/cb_recall. This is not the case with my test because exofs limits one cb_layout per inode, and my test only uses one file. But with multiple files test it would. > Boaz, I think you saw multiple callbacks going out concurrently, right? > No, that was a stupid exofs bug. I showed you. I was sending two cb_layout each time, Just for fun ;-) (Patch coming soon) > rpc_restart_call_prepare() should be called instead of rpc_restart_call() > Yes, that should work better, and nfsd4_cb_recall_done should be fixed as well. Am testing cb_layout_done() path. (Will send patches soon) Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html