On Thu, Jun 18, 2020 at 02:29:42PM +0000, Trond Myklebust wrote: > On Thu, 2020-06-18 at 09:54 +0000, inoguchi.yuki@xxxxxxxxxxx wrote: > > > What does the client do to its cache when it writes to a locked > > > range? > > > > > > The RFC: > > > > > > https://tools.ietf.org/html/rfc7530#section-10.3.2 > > > > > > seems to apply that you should get something like local-filesystem > > > semantics if you write-lock any range that you write to and read- > > > lock > > > any range that you read from. > > > > > > But I see a report that when applications write to non-overlapping > > > ranges (while taking locks over those ranges), they don't see each > > > other's updates. > > > > > > I think for simultaneous non-overlapping writes to work that way, > > > the > > > client would need to invalidate its cache on unlock (except for the > > > locked range). But i can't tell what the client's designed to do. > > > > Simultaneous non-overlapping WRITEs is not taken into consideration > > in RFC7530. > > I personally think it is not necessary to deal with this case by > > modifying the kernel because > > the application on the client can be implemented to avoid it. > > > > Serialization of the simultaneous operations may be one of the ways. > > Just before the write operation, each client locks and reads the > > overlapped range of data > > instead of obtaining a lock in their own non-overlapping range. > > They can reflect updates from other clients in this case. > > > > Yuki Inoguchi > > > > > --b. > > See the function 'fs/nfs/file.c:do_setlk()'. We flush dirty file data > both before and after taking the byte range lock. After taking the > lock, we force a revalidation of the data before returning control to > the application (unless there is a delegation that allows us to cache > more aggressively). > > In addition, if you look at fs/nfs/file.c:do_unlk() you'll note that we > force a flush of all dirty file data before releasing the lock. > > Finally, note that we turn off assumptions of close-to-open caching > semantics when we detect that the application is using locking, and we > turn off optimisations such as assuming we can extend writes to page > boundaries when the page is marked as being up to date. > > IOW: if all the clients are running Linux, then the thread that took > the lock should see 100% up to date data in the locked range. I believe > most (if not all) non-Linux clients use similar semantics when > taking/releasing byte range locks, so they too should be fine. I probably don't understand the algorithm (in particular, how it revalidates caches after a write). How does it avoid a race like this?: Start with a file whose data is all 0's and change attribute x: client 0 client 1 -------- -------- take write lock on byte 0 take write lock on byte 1 write 1 to offset 0 change attribute now x+1 write 1 to offset 1 change attribute now x+2 getattr returns x+2 getattr returns x+2 unlock unlock take readlock on byte 1 At this point a getattr will return change attribute x+2, the same as was returned after client 0's write. Does that mean client 0 assumes the file data is unchanged since its last write? --b.