Re: client caching and locks

"bfields@xxxxxxxxxxxx" <bfields@xxxxxxxxxxxx> · Thu, 18 Jun 2020 16:09:05 -0400

On Thu, Jun 18, 2020 at 02:29:42PM +0000, Trond Myklebust wrote:
> On Thu, 2020-06-18 at 09:54 +0000, inoguchi.yuki@xxxxxxxxxxx wrote:
> > > What does the client do to its cache when it writes to a locked
> > > range?
> > > 
> > > The RFC:
> > > 
> > > 	https://tools.ietf.org/html/rfc7530#section-10.3.2
> > > 
> > > seems to apply that you should get something like local-filesystem
> > > semantics if you write-lock any range that you write to and read-
> > > lock
> > > any range that you read from.
> > > 
> > > But I see a report that when applications write to non-overlapping
> > > ranges (while taking locks over those ranges), they don't see each
> > > other's updates.
> > > 
> > > I think for simultaneous non-overlapping writes to work that way,
> > > the
> > > client would need to invalidate its cache on unlock (except for the
> > > locked range).  But i can't tell what the client's designed to do.
> > 
> > Simultaneous non-overlapping WRITEs is not taken into consideration
> > in RFC7530.
> > I personally think it is not necessary to deal with this case by
> > modifying the kernel because
> > the application on the client can be implemented to avoid it.
> > 
> > Serialization of the simultaneous operations may be one of the ways.
> > Just before the write operation, each client locks and reads the
> > overlapped range of data
> > instead of obtaining a lock in their own non-overlapping range.
> > They can reflect updates from other clients in this case.
> > 
> > Yuki Inoguchi
> > 
> > > --b.
> 
> See the function 'fs/nfs/file.c:do_setlk()'. We flush dirty file data
> both before and after taking the byte range lock. After taking the
> lock, we force a revalidation of the data before returning control to
> the application (unless there is a delegation that allows us to cache
> more aggressively).
> 
> In addition, if you look at fs/nfs/file.c:do_unlk() you'll note that we
> force a flush of all dirty file data before releasing the lock.
> 
> Finally, note that we turn off assumptions of close-to-open caching
> semantics when we detect that the application is using locking, and we
> turn off optimisations such as assuming we can extend writes to page
> boundaries when the page is marked as being up to date.
> 
> IOW: if all the clients are running Linux, then the thread that took
> the lock should see 100% up to date data in the locked range. I believe
> most (if not all) non-Linux clients use similar semantics when
> taking/releasing byte range locks, so they too should be fine.

I probably don't understand the algorithm (in particular, how it
revalidates caches after a write).

How does it avoid a race like this?:

Start with a file whose data is all 0's and change attribute x:

        client 0                        client 1
        --------                        --------
        take write lock on byte 0
                                        take write lock on byte 1
        write 1 to offset 0
          change attribute now x+1
                                        write 1 to offset 1
                                          change attribute now x+2
        getattr returns x+2
                                        getattr returns x+2
        unlock
                                        unlock

        take readlock on byte 1

At this point a getattr will return change attribute x+2, the same as
was returned after client 0's write.  Does that mean client 0 assumes
the file data is unchanged since its last write?

--b.