Re: RPC retransmission of write requests containing bogus data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2008-10-17 at 09:32 -0400, Talpey, Thomas wrote:
> At 07:01 AM 10/17/2008, Ian Campbell wrote:
> >(please CC me, I am not currently subscribed to linux-nfs)
> >...
> >Presumably in the case of a decent NFS server the XID request cache
> >would prevent the bogus data actually reaching the disk but on a
> >non-decent server I suspect it might actually lead to corruption (AIUI
> >the request cache is not a hard requirement of the NFS protocol?).
> >Perhaps even a decent server might have timed out the entry in the cache
> >after such a delay?
> 
> Unfortunately no - because 1) your retransmissions are not, in fact,
> duplicates since the data has changed and 2) no NFSv3 reply cache
> works perfectly, especially under heavy load. The NFSv4.1 session
> addresses this, but that's not at issue here.
> 
> This is a really nasty race. The whole thing starts with the dropped
> TCP segment evidenced at #2 of your trace. Then, the retransmission
> appears to have been scheduled prior to the write reply making it back
> to the client through the TCP storm, so the retransmit is actually pending
> on the wire while the NFS write operation is completed.
> 
> The fix here is to break the connection before retrying, a long-standing
> pet peeve of mine that NFSv3 historically does not do. Setting the
> clnt->cl_discrtry bit in the RPC client struct is all that's required. The
> NFSv4 client does this by default, btw.
> 
> Tom.

It's not a perfect fix, which is why we haven't done that for NFSv3.

When you break the connection, there is the chance that a reply to a
non-idempotent request may get lost, and that the server doesn't
recognise the retransmission due to the above mentioned imperfections
with the replay cache. In that case, the client may get a downright
_wrong_ reply (for instance, it may see an EEXIST reply to a mkdir
request that was actually successful).

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@xxxxxxxxxx
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux