On Oct 29, 2011, at 3:52 PM, David Flynn wrote: > * Myklebust, Trond (Trond.Myklebust@xxxxxxxxxx) wrote: >>> -----Original Message----- >>> From: Chuck Lever [mailto:chuck.lever@xxxxxxxxxx] >>> On Oct 29, 2011, at 2:47 PM, J. Bruce Fields wrote: >>>> Yes, and it's not something I care that strongly about, really, my >>>> only observation is that this sort of failure (an implementation >>>> bug on one side or another resulting in a loop) seems to have been >>>> common (based on no hard data, just my vague memories of list >>>> threads), and the results fairly obnoxious (possibly even for >>>> unrelated hosts on the network). >>>> So if there's some simple way to fail more gracefully it might be >>>> helpful. >>> >>> For what it's worth, I agree that client implementations should >>> attempt to behave more gracefully in the face of server problems, be >>> they the result of bugs or the result of other issues specific to >>> that server. Problems like this make NFSv4 as a protocol look bad. >> >> I can't see what a client can do in this situation except possibly just >> give up after a while and throw a SERVER_BROKEN error (which means data >> loss). That still won't make NFSv4 look good... > > Indeed, it is a quite the dilemma. > > I agree that giving and guaranteeing unattended data loss is bad (data > loss at the behest of an operator is ok, afterall they can always fence > a broken machine). David, what would help immensely is if you can find a reliable way of reproducing this. So far we have been unable to find a reproducer. > Looking at some of the logs again, even going back to the very original > case, it appears to be about 600us between retries (RTT=400us). Is > there any way to make that less aggressive?, eg 1s? -- that'd reduce the > impact by three orders of magnitude. What would be the down-side? How > often do you expect to get a BAD_STATEID error? > > ..david -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html