On Tue, 2018-09-11 at 16:40 -0400, Chuck Lever wrote: > > On Sep 11, 2018, at 4:00 PM, Trond Myklebust < > > trondmy@xxxxxxxxxxxxxxx> wrote: > > > > On Tue, 2018-09-11 at 14:02 -0400, Chris Siebenmann wrote: > > > > > We've found a readily reproducable situation where the > > > > > current > > > > > NFS client code will provide zero bytes instead of actual > > > > > data at > > > > > the end of the file (sort of) to user programs. This can > > > > > result > > > > > in program failure, or permanent file corruption if the > > > > > program > > > > > reading the file writes the bad data back to the file; > > > > > otherwise, > > > > > the corruption goes away when the client's cached data is > > > > > pushed > > > > > out > > > > > of memory (or explicitly dropped by dropping the pagecache > > > > > through > > > > > /proc/sys/vm/drop_caches). > > > > > > [...] > > > > Please see http://nfs.sourceforge.net/#faq_a8 > > > > > > I don't think this is a close to open consistency issue, or if it > > > is > > > I would argue that it is a clear bug on the Linux NFS client. I > > > have > > > a number of reasons for saying this: > > > > > > - the client clearly sees the new attributes; it knows that the > > > file > > > has been extended from the previous state that it knew of. My > > > demo > > > program specifically waits until user-level fstat() returns a > > > different > > > result, which I believe means that the client kernel has seen a > > > different > > > GETATTR result and so should have purged its cache (based on > > > what > > > the > > > FAQ says). > > > > > > (Unless the FAQ means that the kernel absolutely refuses to > > > guarantee > > > anything about file consistency unless you close and then reopen > > > the > > > file, even if it *knows* that the file has changed on the > > > server, > > > which isn't clear from how the FAQ is currently written.) > > > > > > - the client is fetching some new data from the fileserver (data > > > after > > > the partial 4 KB page at the old end of the file). > > > > > > - the client isn't writing to the file in my demonstration > > > program; > > > it's > > > only opening it in read-write mode and then reading it. Also, > > > this > > > doesn't happen if the client does exactly the same set of > > > operations > > > but has the file open read-only (with it staying open > > > throughout). > > > > > > - this didn't happen in older kernels. > > > > > > In addition, although I didn't mention it in my original email, > > > this > > > happens on a NFS filesystem mounted 'noac'. > > > > > > Pragmatically, Alpine used to work with NFS mounted filesystems > > > where > > > email was appended to them from other machines and it no longer > > > does, > > > and the only difference is the kernel version involved on the > > > client. > > > This breakage is actively dangerous. > > > > Sure, but unless you are locking the file, or you are explicitly > > using > > O_DIRECT to do uncached I/O, then you are in violation of the > > close-to- > > open consistency model, and the client is going to behave as you > > describe above. NFS uses a distributed filesystem model, not a > > clustered one. > > I would expect Alpine to work if "vers=3,noac" is in use. > noac has nothing at all to do with data cache consistency. -- Trond Myklebust CTO, Hammerspace Inc 4300 El Camino Real, Suite 105 Los Altos, CA 94022 www.hammer.space