On Wed, Aug 5, 2015 at 8:15 PM, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: > > On Aug 5, 2015, at 6:27 PM, Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> wrote: > >> On Wed, Aug 5, 2015 at 2:44 PM, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: >>> >>> On Aug 5, 2015, at 2:34 PM, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: >>> >>>> Commit 14546c337588 ("NFS: Don't do a full flush to disk on close() >>>> if we hold a delegation") added an optimization. When an NFSv4 write >>>> delegation is present, close(2) does not wait while a file's dirty >>>> data is flushed to the NFS server. >>>> >>>> However, if the application workload immediately re-opens that file, >>>> nfs_may_open() can perform an ACCESS and GETATTR which runs >>>> concurrently with the flushing WRITE. If the flushing WRITE and >>>> GETATTR complete out of order on the server, the file size cached on >>>> the client will go backwards, possibly resulting in new writes going >>>> to the wrong file offset. >>>> >>>> Add a write barrier before the access check to ensure the server's >>>> idea of the file's size is properly up to date. >>>> >>>> The downside of this approach is that each fresh open(2) of a dirty >>>> file results in an extra flush. It seems to me that _any_ open(2) >>>> done while there is dirty data waiting on the client could result in >>>> a file size roll back. However, I see bad behavior only when the >>>> client holds a write delegation. >>>> >>>> Fixes: 14546c337588 ("NFS: Don't do a full flush to disk on . . .") >>>> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> >>>> --- >>>> fs/nfs/dir.c | 9 +++++++++ >>>> 1 file changed, 9 insertions(+) >>>> >>>> I'm not certain this is a good long term fix. Some other possible >>>> solutions include: >>>> >>>> - Not performing the access check if the client holds a delegation >>>> - Not performing a GETATTR as part of the ACCESS check >>>> - Simply marking the file attributes stale instead of using the >>>> returned file size >>>> - Reverting commit 14546c337588 >>> >>> OK. If the client holds a write delegation, then it shouldn't care >>> about the server's file size at all until it has flushed all dirty >>> data and returned the delegation. So flushing here is probably wrong. >>> >>> But the incoming file size in the GETATTR is definitely screwing up >>> the cached file size. >>> >> >> In which kernels are you seeing the race? For recent kernels (v4.0+) >> the write code should be calling nfs_fattr_set_barrier() in order to >> prevent the result from the ACCESS from overwriting the new file size. > > I'm testing on 4.2-rc4. > OK. Does turning off the check for nfs_ctime_need_update() in nfs_inode_attrs_need_update() fix the problem? -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html