On Sat, 5 Dec 2015 13:02:22 +0100 Christoph Hellwig <hch@xxxxxx> wrote: > On Fri, Dec 04, 2015 at 03:51:10PM -0500, Jeff Layton wrote: > > > There is no reason not to do it, except for the significant effort > > > to implement it a well as a synthetic test case to actually reproduce > > > the behavior we want to handle. > > > > Could you end up livelocking here? Suppose you issue the callback and > > the client returns success. He then returns the layout and gets a new > > one just before the delay timer pops. We then end up recalling _that_ > > layout...rinse, repeat... > > If we start allowing layoutgets before the whole range has been > returned there is a great chance for livelocks, yes. But I don't think > we should allow layoutgets to proceed before that. Maybe I didn't describe it well enough. I think you can still end up looping even if you don't allow LAYOUTGETs before the entire range is returned. If we treat NFS4_OK and NFS4ERR_DELAY equivalently, then we're expecting the client to eventually return NFS4ERR_NOMATCHING_LAYOUT (or a different error) to break the cycle of retransmissions. But, HZ/100 is enough time for the client to return a layout and request a new one. We may never see that error -- only a continual cycle of CB_LAYOUTRECALL/LAYOUTRETURN/LAYOUTGET. I think we need a more reliable way to break that cycle so we don't end up looping like that. We should either cancel any active callbacks before reallowing LAYOUTGETs, or move the timeout handling outside of the RPC state machine (like Bruce was suggesting). -- Jeff Layton <jlayton@xxxxxxxxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html