On 03/05/2014 10:15 PM, Brian Hawley wrote:
In my experience, you won't get the i/o errors reported back to the read/write/close operations. I don't know for certain, but I suspect this may be due to caching and chunking to turn I/o matching the rsize/wsize settings; and possibly the fact that the peer disconnection isn't noticed unless the nfs server resets (ie cable disconnection isn't sufficient). The inability to get the i/o errors back to the application has been a major pain for us. On a lark we did find that repeated unmont -f's does get i/o errors back to the application, but isn't our preferred way.
The key to get IO errors promptly is to make sure you use fsync/fdatasync (and so on) when you hit those points in your application that are where you want to recover from if things crash, get disconnected, etc.
Those will push out the data from the page cache while your application is still around which is critical for any potential need to do recovery.
Note that this is not just an issue with NFS, any file system (including local file systems) normally completes the write request when the IO hits the page cache. When that page eventually gets sent down to the permanent storage device (NFS server, local disk, etc), your process is potentially no longer around and certainly not waiting for IO errors in the original write call :)
To make this even trickier is that the calls like fsync() that persist data have a substantial performance impact, so you don't want to over-use them. (Try writing a 1GB file with an fsync() before close and comparing that to writing a 1GB file opened in O_DIRECT|O_SYNC mode for the worst case for example :))
Ric -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html