Re: A NFS client partial file corruption problem in recent/current kernels

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Tue, 11 Sep 2018 20:56:39 +0000

On Tue, 2018-09-11 at 16:40 -0400, Chuck Lever wrote:
> > On Sep 11, 2018, at 4:00 PM, Trond Myklebust <
> > trondmy@xxxxxxxxxxxxxxx> wrote:
> > 
> > On Tue, 2018-09-11 at 14:02 -0400, Chris Siebenmann wrote:
> > > > > We've found a readily reproducable situation where the
> > > > > current
> > > > > NFS client code will provide zero bytes instead of actual
> > > > > data at
> > > > > the end of the file (sort of) to user programs. This can
> > > > > result
> > > > > in program failure, or permanent file corruption if the
> > > > > program
> > > > > reading the file writes the bad data back to the file;
> > > > > otherwise,
> > > > > the corruption goes away when the client's cached data is
> > > > > pushed
> > > > > out
> > > > > of memory (or explicitly dropped by dropping the pagecache
> > > > > through
> > > > > /proc/sys/vm/drop_caches).
> > > 
> > > [...]
> > > > Please see http://nfs.sourceforge.net/#faq_a8
> > > 
> > > I don't think this is a close to open consistency issue, or if it
> > > is
> > > I would argue that it is a clear bug on the Linux NFS client. I
> > > have
> > > a number of reasons for saying this:
> > > 
> > > - the client clearly sees the new attributes; it knows that the
> > > file
> > >  has been extended from the previous state that it knew of. My
> > > demo
> > >  program specifically waits until user-level fstat() returns a
> > > different
> > >  result, which I believe means that the client kernel has seen a
> > > different
> > >  GETATTR result and so should have purged its cache (based on
> > > what
> > > the
> > >  FAQ says).
> > > 
> > >  (Unless the FAQ means that the kernel absolutely refuses to
> > > guarantee
> > >  anything about file consistency unless you close and then reopen
> > > the
> > >  file, even if it *knows* that the file has changed on the
> > > server,
> > >  which isn't clear from how the FAQ is currently written.)
> > > 
> > > - the client is fetching some new data from the fileserver (data
> > > after
> > >  the partial 4 KB page at the old end of the file).
> > > 
> > > - the client isn't writing to the file in my demonstration
> > > program;
> > > it's
> > >  only opening it in read-write mode and then reading it. Also,
> > > this
> > >  doesn't happen if the client does exactly the same set of
> > > operations
> > >  but has the file open read-only (with it staying open
> > > throughout).
> > > 
> > > - this didn't happen in older kernels.
> > > 
> > > In addition, although I didn't mention it in my original email,
> > > this
> > > happens on a NFS filesystem mounted 'noac'.
> > > 
> > > Pragmatically, Alpine used to work with NFS mounted filesystems
> > > where
> > > email was appended to them from other machines and it no longer
> > > does,
> > > and the only difference is the kernel version involved on the
> > > client.
> > > This breakage is actively dangerous.
> > 
> > Sure, but unless you are locking the file, or you are explicitly
> > using
> > O_DIRECT to do uncached I/O, then you are in violation of the
> > close-to-
> > open consistency model, and the client is going to behave as you
> > describe above. NFS uses a distributed filesystem model, not a
> > clustered one.
> 
> I would expect Alpine to work if "vers=3,noac" is in use.
> 

noac has nothing at all to do with data cache consistency.

-- 
Trond Myklebust
CTO, Hammerspace Inc
4300 El Camino Real, Suite 105
Los Altos, CA 94022
www.hammer.space