On Feb 25, 2015, at 8:27 PM, Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> wrote: > On Wed, 2015-02-25 at 19:43 -0500, Trond Myklebust wrote: >> On Wed, 2015-02-25 at 19:37 -0500, Trond Myklebust wrote: >>> On Wed, 2015-02-25 at 17:32 -0500, Chuck Lever wrote: >>>> FWIW it’s easy to reproduce a similar race with fsx, and I encounter >>>> it frequently while running xfstests on fast NFS servers. >>>> >>>> fsx invokes ftruncate following a set of asynchronous reads >>>> (generated possibly due to readahead). The reads are started first, >>>> then the SETATTR, but they complete out of order. >>>> >>>> The SETATTR changes the test file’s size, and the completion >>>> updates the file size in the client’s inode. Then the read requests >>>> complete on the client and set the file’s size back to its old value. >>>> >>>> All it takes is one late read completion, and the cached file size >>>> is corrupted. fsx detects the file size mismatch and terminates the >>>> test. The file size is corrected by a subsequent GETATTR (say, an >>>> “ls -l” to check it after fsx has terminated). >>>> >>>> While SETATTR blocks concurrent writes, there’s no serialization >>>> on either the client or server to help guarantee the ordering of >>>> SETATTR with read operations. >>>> >>>> I’ve found a successful workaround by forcing the client to ignore >>>> post-op attrs in read replies. A stronger solution might simply set >>>> the “file attributes need update” flag in the inode if any file >>>> attribute mutation is noticed during a read completion. >>> >>> That's different. We definitely should aim to fix this kind of issue >>> since you are talking about a single client being the only thing >>> accessing the file on the server. >>> How about the following patch? >> >> Let me retry without the typos. The following will actually >> compile... :-/ > > Third version: this contains a minor optimisation so that we don't force > the re-read of the last page in the file. Also a little more detail in > the changelog. Thanks, I’ll give this a shot, hopefully today. > 8<-------------------------------------------------------------------- > From d2c822d64a68542665e4887b4d7bccd6e8e3a741 Mon Sep 17 00:00:00 2001 > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > Date: Wed, 25 Feb 2015 19:26:28 -0500 > Subject: [PATCH] NFS: Quiesce reads before updating the file size after > truncating > > Chuck Lever reports seeing readaheads racing with truncate operations > and causing the file size to be reverted. Fix is to quiesce reads > after truncating the file on the server, but before updating the > size on the client. > > Note: > > 1) We believe that waiting for those reads that lie outside the new > file boundaries is sufficient, since this means any attempt to read > the data in those areas will trigger a new RPC call, which will be > ordered w.r.t. the size change on the server. > > 2) We do not move the call to truncate_pagecache(), since we want > to ensure that any partial page zeroing is ordered w.r.t. the > update of inode->i_size. > > Reported-by: Chuck Lever <chuck.lever@xxxxxxxxxx> > Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > --- > fs/nfs/inode.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c > index 83107be3dd01..daf3b1e183fe 100644 > --- a/fs/nfs/inode.c > +++ b/fs/nfs/inode.c > @@ -565,6 +565,11 @@ static int nfs_vmtruncate(struct inode * inode, loff_t offset) > if (err) > goto out; > > + /* Quiesce reads that lie outside the new file size */ > + invalidate_inode_pages2_range(inode->i_mapping, > + (offset + PAGE_CACHE_SHIFT - 1) >> PAGE_CACHE_SHIFT, > + -1); > + > spin_lock(&inode->i_lock); > i_size_write(inode, offset); > /* Optimisation */ > -- > 2.1.0 > > > -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html