On Tue, 2018-09-11 at 14:02 -0400, Chris Siebenmann wrote: > > > We've found a readily reproducable situation where the current > > > NFS client code will provide zero bytes instead of actual data at > > > the end of the file (sort of) to user programs. This can result > > > in program failure, or permanent file corruption if the program > > > reading the file writes the bad data back to the file; otherwise, > > > the corruption goes away when the client's cached data is pushed > > > out > > > of memory (or explicitly dropped by dropping the pagecache > > > through > > > /proc/sys/vm/drop_caches). > > [...] > > Please see http://nfs.sourceforge.net/#faq_a8 > > I don't think this is a close to open consistency issue, or if it is > I would argue that it is a clear bug on the Linux NFS client. I have > a number of reasons for saying this: > > - the client clearly sees the new attributes; it knows that the file > has been extended from the previous state that it knew of. My demo > program specifically waits until user-level fstat() returns a > different > result, which I believe means that the client kernel has seen a > different > GETATTR result and so should have purged its cache (based on what > the > FAQ says). > > (Unless the FAQ means that the kernel absolutely refuses to > guarantee > anything about file consistency unless you close and then reopen > the > file, even if it *knows* that the file has changed on the server, > which isn't clear from how the FAQ is currently written.) > > - the client is fetching some new data from the fileserver (data > after > the partial 4 KB page at the old end of the file). > > - the client isn't writing to the file in my demonstration program; > it's > only opening it in read-write mode and then reading it. Also, this > doesn't happen if the client does exactly the same set of > operations > but has the file open read-only (with it staying open throughout). > > - this didn't happen in older kernels. > > In addition, although I didn't mention it in my original email, this > happens on a NFS filesystem mounted 'noac'. > > Pragmatically, Alpine used to work with NFS mounted filesystems where > email was appended to them from other machines and it no longer does, > and the only difference is the kernel version involved on the client. > This breakage is actively dangerous. Sure, but unless you are locking the file, or you are explicitly using O_DIRECT to do uncached I/O, then you are in violation of the close-to- open consistency model, and the client is going to behave as you describe above. NFS uses a distributed filesystem model, not a clustered one. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx