Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels

Ric Wheeler <ricwheeler@xxxxxxxxx> · Thu, 06 Mar 2014 11:37:55 +0200

On 03/05/2014 10:15 PM, Brian Hawley wrote:
In my experience, you won't get the i/o errors reported back to the read/write/close operations.   I don't know for certain, but I suspect this may be due to caching and chunking to turn I/o matching the rsize/wsize settings; and possibly the fact that the peer disconnection isn't noticed unless the nfs server resets (ie cable disconnection isn't sufficient).

The inability to get the i/o errors back to the application has been a major pain for us.

On a lark we did find that repeated unmont -f's does get i/o errors back to the application, but isn't our preferred way.

The key to get IO errors promptly is to make sure you use fsync/fdatasync (and 
so on) when you hit those points in your application that are where you want to 
recover from if things crash, get disconnected, etc.

Those will push out the data from the page cache while your application is still 
around which is critical for any potential need to do recovery.

Note that this is not just an issue with NFS, any file system (including local 
file systems) normally completes the write request when the IO hits the page 
cache.  When that page eventually gets sent down to the permanent storage device 
(NFS server, local disk, etc), your process is potentially no longer around and 
certainly not waiting for IO errors in the original write call :)

To make this even trickier is that the calls like fsync() that persist data have 
a substantial performance impact, so you don't want to over-use them.  (Try 
writing a 1GB file with an fsync() before close and comparing that to writing a 
1GB file opened in O_DIRECT|O_SYNC mode for the worst case for example :))

Ric

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html