Re: Correctly understanding Linux's close-to-open consistency

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Sun, 16 Sep 2018 16:12:43 +0000

On Sun, 2018-09-16 at 07:01 -0400, Jeff Layton wrote:
> On Sat, 2018-09-15 at 15:11 -0400, Chris Siebenmann wrote:
> > > On Wed, 2018-09-12 at 21:24 -0400, Chris Siebenmann wrote:
> > > >  Is it correct to say that when writing data to NFS files, the
> > > > only
> > > > sequence of operations that Linux NFS clients officially
> > > > support is
> > > > the following:
> > > > 
> > > > - all processes on all client machines close() the file
> > > > - one machine (a client or the fileserver) opens() the file,
> > > > writes
> > > >   to it, and close()s again
> > > > - processes on client machines can now open() the file again
> > > > for
> > > >   reading
> > > 
> > > No.
> > > 
> > > One can always call fsync() to force data to be flushed to avoid
> > > the
> > > close of the write fd in this situation. That's really a more
> > > portable
> > > solution anyway. A local filesystem may not flush data to disk,
> > > on close
> > > (for instance) so calling fsync will ensure you rely less on
> > > filesystem
> > > implementation details.
> > > 
> > > The separate open by the reader just helps ensure that the file's
> > > attributes are revalidated (so you can tell whether cached data
> > > you
> > > hold is still valid).
> > 
> >  This bit about the separate open doesn't seem to be the case
> > currently, and people here have asserted that it's not true in
> > general. Specifically, under some conditions *not involving you
> > writing*, if you do not close() the file before another machine
> > writes
> > to it and then open() it afterward, the kernel may retain cached
> > data
> > that it is in a position to know (for sure) is invalid because it
> > didn't
> > exist in the previous version of the file (as it was past the end
> > of
> > file position).
> > 
> >  Since failing to close() before another machine open()s puts you
> > outside this outline of close-to-open, this kernel behavior is not
> > a
> > bug as such (or so it's been explained to me here).  If you go
> > outside
> > c-t-o, the kernel is free to do whatever it finds most convenient,
> > and
> > what it found most convenient was to not bother invalidating some
> > cached
> > page data even though it saw a GETATTR change.
> > 
> 
> That would be a bug. If we have reason to believe the file has
> changed,
> then we must invalidate the cache on the file prior to allowing a
> read
> to proceed.

The point here is that when the file is open for writing (or for
read+write), and your applications are not using locking, then we have
no reason to believe the file is being changed on the server, and we
deliberately optimise for the case where the cache consistency rules
are being observed.

If the file is open for reading only, then we may detect changes on the
server. However we certainly cannot guarantee that the data is
consistent due to the potential for write reordering as discussed
earlier in this thread, and due to the fact that attribute revalidation
is not atomic with reads.

Again, these are the cases where you are _not_ using locking to
mediate. If you are using locking, then I agree that changes need to be
seen by the client.
-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx