On Sun, Dec 06, 2015 at 08:09:54AM -0500, Jeff Layton wrote: > On Sat, 5 Dec 2015 07:24:09 -0500 > Jeff Layton <jlayton@xxxxxxxxxxxxxxx> wrote: > > > On Sat, 5 Dec 2015 13:02:22 +0100 > > Christoph Hellwig <hch@xxxxxx> wrote: > > > > > On Fri, Dec 04, 2015 at 03:51:10PM -0500, Jeff Layton wrote: > > > > > There is no reason not to do it, except for the significant effort > > > > > to implement it a well as a synthetic test case to actually reproduce > > > > > the behavior we want to handle. > > > > > > > > Could you end up livelocking here? Suppose you issue the callback and > > > > the client returns success. He then returns the layout and gets a new > > > > one just before the delay timer pops. We then end up recalling _that_ > > > > layout...rinse, repeat... > > > > > > If we start allowing layoutgets before the whole range has been > > > returned there is a great chance for livelocks, yes. But I don't think > > > we should allow layoutgets to proceed before that. > > > > Maybe I didn't describe it well enough. I think you can still end up > > looping even if you don't allow LAYOUTGETs before the entire range is > > returned. > > > > If we treat NFS4_OK and NFS4ERR_DELAY equivalently, then we're > > expecting the client to eventually return NFS4ERR_NOMATCHING_LAYOUT (or > > a different error) to break the cycle of retransmissions. But, HZ/100 > > is enough time for the client to return a layout and request a new one. > > We may never see that error -- only a continual cycle of > > CB_LAYOUTRECALL/LAYOUTRETURN/LAYOUTGET. > > > > I think we need a more reliable way to break that cycle so we don't end > > up looping like that. We should either cancel any active callbacks > > before reallowing LAYOUTGETs, or move the timeout handling outside of > > the RPC state machine (like Bruce was suggesting). > > > > Either way...in the near term we should probably take the patch that I > originally proposed, just to ensure that no one hits the bugs that > Kinglong hit. That does still leave some gaps in the seqid handling, > but those are preferable to the warning and deadlock. > > Bruce, does that sound reasonable? Yes, I think I'll just apply the below (your patch with a couple extra sentences in the changelog), and pass that along for 4.4 soon. --b. commit be20aa00c671 Author: Jeff Layton <jlayton@xxxxxxxxxxxxxxx> Date: Sun Nov 29 08:46:14 2015 -0500 nfsd: don't hold ls_mutex across a layout recall We do need to serialize layout stateid morphing operations, but we currently hold the ls_mutex across a layout recall which is pretty ugly. It's also unnecessary -- once we've bumped the seqid and copied it, we don't need to serialize the rest of the CB_LAYOUTRECALL vs. anything else. Just drop the mutex once the copy is done. This was causing a "workqueue leaked lock or atomic" warning and an occasional deadlock. There's more work to be done here but this fixes the immediate regression. Fixes: cc8a55320b5f "nfsd: serialize layout stateid morphing operations" Cc: stable@xxxxxxxxxxxxxxx Reported-by: Kinglong Mee <kinglongmee@xxxxxxxxx> Signed-off-by: Jeff Layton <jeff.layton@xxxxxxxxxxxxxxx> Signed-off-by: J. Bruce Fields <bfields@xxxxxxxxxx> diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c index 9ffef06b30d5..c9d6c715c0fb 100644 --- a/fs/nfsd/nfs4layouts.c +++ b/fs/nfsd/nfs4layouts.c @@ -616,6 +616,7 @@ nfsd4_cb_layout_prepare(struct nfsd4_callback *cb) mutex_lock(&ls->ls_mutex); nfs4_inc_and_copy_stateid(&ls->ls_recall_sid, &ls->ls_stid); + mutex_unlock(&ls->ls_mutex); } static int @@ -659,7 +660,6 @@ nfsd4_cb_layout_release(struct nfsd4_callback *cb) trace_layout_recall_release(&ls->ls_stid.sc_stateid); - mutex_unlock(&ls->ls_mutex); nfsd4_return_all_layouts(ls, &reaplist); nfsd4_free_layouts(&reaplist); nfs4_put_stid(&ls->ls_stid); -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html