On Mar 5, 2014, at 3:15 PM, Brian Hawley <bhawley@xxxxxxxxxxx> wrote: > > In my experience, you won't get the i/o errors reported back to the read/write/close operations. I don't know for certain, but I suspect this may be due to caching and chunking to turn I/o matching the rsize/wsize settings; and possibly the fact that the peer disconnection isn't noticed unless the nfs server resets (ie cable disconnection isn't sufficient). > > The inability to get the i/o errors back to the application has been a major pain for us. > > On a lark we did find that repeated unmont -f's does get i/o errors back to the application, but isn't our preferred way. > > > -----Original Message----- > From: Andrew Martin <amartin@xxxxxxxxxxx> > Sender: linux-nfs-owner@xxxxxxxxxxxxxxx > Date: Wed, 5 Mar 2014 11:45:24 > To: <linux-nfs@xxxxxxxxxxxxxxx> > Subject: Optimal NFS mount options to safely allow interrupts and timeouts > on newer kernels > > Hello, > > Is it safe to use the "soft" mount option with proto=tcp on newer kernels (e.g > 3.2 and newer)? Currently using the "defaults" nfs mount options on Ubuntu > 12.04 results in processes blocking forever in uninterruptable sleep if they > attempt to access a mountpoint while the NFS server is offline. I would prefer > that NFS simply return an error to the clients after retrying a few times, > however I also cannot have data loss. From the man page, I think these options > will give that effect? > soft,proto=tcp,timeo=10,retrans=3 > >> From my understanding, this will cause NFS to retry the connection 3 times (once > per second), and then if all 3 are unsuccessful return an error to the > application. Is this correct? Is there a risk of data loss or corruption by > using "soft" in this way? Or is there a better way to approach this? There is always a silent data corruption risk with “soft.” Using TCP and a long retransmit timeout mitigates the risk, but it is still there. A one second timeout for TCP is very short, and will almost certainly result in trouble, especially if the server or network are slow. You should be able to ^C any waiting NFS process. Blocking forever is usually the sign of a bug. In general, NFS is not especially tolerant of server unavailability. You may want to consider some other distributed file system protocol that is more fault-tolerant, or find ways to ensure your NFS servers are always accessible. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html