On Sun, 2018-09-16 at 20:18 -0400, Chris Siebenmann wrote: > > > > Since failing to close() before another machine open()s puts > > > > you > > > > outside this outline of close-to-open, this kernel behavior is > > > > not a bug as such (or so it's been explained to me here). If > > > > you > > > > go outside c-t-o, the kernel is free to do whatever it finds > > > > most > > > > convenient, and what it found most convenient was to not bother > > > > invalidating some cached page data even though it saw a GETATTR > > > > change. > > > > > > That would be a bug. If we have reason to believe the file has > > > changed, then we must invalidate the cache on the file prior to > > > allowing a read to proceed. > > > > The point here is that when the file is open for writing (or for > > read+write), and your applications are not using locking, then we > > have > > no reason to believe the file is being changed on the server, and > > we > > deliberately optimise for the case where the cache consistency > > rules > > are being observed. > > In this case the user level can be completely sure that the client > kernel has issued a GETATTR and received a different answer from the > NFS server, because the fstat() results it sees have changed from the > values it has seen before (and remembered). This may not count as the > NFS client kernel code '[having] reason to believe' that the file has > changed on the server from its perspective, but if so it's not > because > the information is not available and a GETATTR would have to be > explicitly > issued to find it out. The client code has made the GETATTR and > received > different results, which it has passed to user level; it has just not > used those results to do things to its cached data. > > Today, if you do a flock(), the NFS client code in the kernel will > do things that invalidate the cached data, despite the GETATTR result > from the fileserver not changing. From my outside perspective, as > someone > writing code or dealing with programs that must work over NFS, this > is a > little bit magical, and as a result I would like to understand if it > is > guaranteed that the magic works or if this is not officially > supported > magic, merely 'it happens to work' magic in the way that having the > file open read-write without the flock() used to work in kernel 4.4.x > but doesn't now (and this is simply considered to be the kernel using > CTO more strongly, not a bug). > > (Looking at a tcpdump trace, the flock() call appears to cause the > kernel > to issue another GETATTR to the fileserver. The results are the same > as > the GETATTR results that were passed to the client program.) This is also documented in the NFS FAQ to which I pointed you earlier. > > Again, these are the cases where you are _not_ using locking to > > mediate. If you are using locking, then I agree that changes need > > to > > be seen by the client. > > The original code (Alpine) *is* using locking in the broad sense, > but it is not flock() locking; instead it is locking (in this case) > through .lock files. The current kernel behavior and what I've been > told about it implies that it is not sufficient for your application > to > perfectly coordinate locking, writes, fsync(), and fstat() visibility > of the resulting changes through its own mechanism; you must do your > locking through the officially approved kernel channels (and it is > not > clear what they are) or see potentially incorrect results. > > Consider a system where reads and writes to a shared file are > coordinated by a central process that everyone communicates with > through > TCP connections. The central process pauses readers before it allows > a writer to start, the writer always fsync()s before it releases its > write permissions, and then no reader is permitted to proceed until > the > entire cluster sees the same updated fstat() result. This is > perfectly > coordinated but currently could see incorrect read() results, and > I've > been told that this is allowed under Linux's CTO rules because all of > the processes hold the file open read-write through this entire > process > (and no one flock()s). > Why would such a system need to use buffered I/O instead of uncached I/O (i.e. O_DIRECT)? What would be the point of optimising the buffered I/O client for this use case rather than the close to open cache consistent case? -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx