Re: A NFS client partial file corruption problem in recent/current kernels

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Wed, 12 Sep 2018 02:19:34 +0000



On Tue, 2018-09-11 at 19:45 -0400, Chris Siebenmann wrote:
> > >  Our issue also happens when the writes are done on the
> > > fileserver,
> > > though, and they occur even if you allow plenty of time for the
> > > writes to settle. I can run my test program in a mode where it
> > > explicitly waits for me to tell it to continue, do the appending
> > > to the file on the fileserver, 'sync' on the fileserver, wait
> > > five
> > > minutes, and the NFS client will still see those zero bytes when
> > > it
> > > tries to read the new data.
> > 
> > That's happening because we're not optimising for the broken case,
> > and
> > instead we assume that we can cache data for as long as the file is
> > open and unlocked as indeed the close-to-open cache consistency
> > model
> > has always stated that we can do.
> 
>  If I'm understanding all of this right, is what the kernel does more
> or less like this: when a NFS client program closes a writeable file
> (descriptor), the kernel flushes any pending writes, does a GETATTR
> afterward, and declares all current cached pages fully valid 'as of'
> that GETATTR result. When the file is reopened (in any mode), the
> kernel
> GETATTRs the file again; if the GETATTR hasn't changed, the cached
> pages
> and their contents remain valid.
> 
>  As a result, if you write to the file from another machine
> (including
> the fileserver) before the writeable file is closed, on close the
> client uses the updated GETATTR from the server but its current
> cached
> pages. These cached pages may be out of date, but if so it is because
> one violated close-to-open; you must always close any writeable file
> descriptors on machine A before writing to the file on machine B (or
> obtain and then release locks?).
> 
>  If a client kernel has cached pages this way, is there any simple
> sequence of system calls on the client that will cause it to discard
> these cached pages? Or do you need the file's GETATTR to change
> again,
> implicitly from another machine? (I assume that changing the file's
> attributes from the client with the cached pages doesn't cause it to
> invalidate them, and certainly eg a 'touch' doesn't do it from the
> client where it does do it from another machine.)

There are 2 ways to manipulate the page cache directly on the client:
   1. You can clear out the entire page cache as the 'root' user, with the
      /proc/sys/vm/drop_caches interface (see 'man 5 proc').
   2. Alternatively, you can use posix_fadvise() with the
      POSIX_FADV_DONTNEED flag to clear out only the pages that you think
      are bad. Make sure to first fsync() so that the pages don't get
      pinned in memory by virtue of being dirty (see 'man 2 fadvise64').

You also have the option of using NFS itself to implicitly change the
cache:
   1. As you said above, you can also change the file on the server while
      it is closed on your client, and then reopen it.
   2. You can perform an O_DIRECT write from the client itself. Both those
      operations will also imply a cache invalidation.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx