On Sat, Dec 05, 2015 at 07:24:09AM -0500, Jeff Layton wrote: > If we treat NFS4_OK and NFS4ERR_DELAY equivalently, then we're > expecting the client to eventually return NFS4ERR_NOMATCHING_LAYOUT (or > a different error) to break the cycle of retransmissions. But, HZ/100 > is enough time for the client to return a layout and request a new one. > We may never see that error -- only a continual cycle of > CB_LAYOUTRECALL/LAYOUTRETURN/LAYOUTGET. > > I think we need a more reliable way to break that cycle so we don't end > up looping like that. We should either cancel any active callbacks > before reallowing LAYOUTGETs, or move the timeout handling outside of > the RPC state machine (like Bruce was suggesting). We block all new LAYOUTGETS as long as fi_lo_recalls is non-zero, and we only only decrement it from nfsd4_cb_layout_release. The way I understand the RPC state machine that means we block new LAYOUTGETS until we have successfully finished the recall. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html