Re: [PATCH] NFS: Add OTW write barrier before may-open test

Chuck Lever <chuck.lever@xxxxxxxxxx> · Wed, 5 Aug 2015 20:15:32 -0400

On Aug 5, 2015, at 6:27 PM, Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> wrote:

> On Wed, Aug 5, 2015 at 2:44 PM, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
>> 
>> On Aug 5, 2015, at 2:34 PM, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
>> 
>>> Commit 14546c337588 ("NFS: Don't do a full flush to disk on close()
>>> if we hold a delegation") added an optimization. When an NFSv4 write
>>> delegation is present, close(2) does not wait while a file's dirty
>>> data is flushed to the NFS server.
>>> 
>>> However, if the application workload immediately re-opens that file,
>>> nfs_may_open() can perform an ACCESS and GETATTR which runs
>>> concurrently with the flushing WRITE. If the flushing WRITE and
>>> GETATTR complete out of order on the server, the file size cached on
>>> the client will go backwards, possibly resulting in new writes going
>>> to the wrong file offset.
>>> 
>>> Add a write barrier before the access check to ensure the server's
>>> idea of the file's size is properly up to date.
>>> 
>>> The downside of this approach is that each fresh open(2) of a dirty
>>> file results in an extra flush. It seems to me that _any_ open(2)
>>> done while there is dirty data waiting on the client could result in
>>> a file size roll back. However, I see bad behavior only when the
>>> client holds a write delegation.
>>> 
>>> Fixes: 14546c337588 ("NFS: Don't do a full flush to disk on . . .")
>>> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
>>> ---
>>> fs/nfs/dir.c |    9 +++++++++
>>> 1 file changed, 9 insertions(+)
>>> 
>>> I'm not certain this is a good long term fix. Some other possible
>>> solutions include:
>>> 
>>> - Not performing the access check if the client holds a delegation
>>> - Not performing a GETATTR as part of the ACCESS check
>>> - Simply marking the file attributes stale instead of using the
>>>  returned file size
>>> - Reverting commit 14546c337588
>> 
>> OK. If the client holds a write delegation, then it shouldn't care
>> about the server's file size at all until it has flushed all dirty
>> data and returned the delegation. So flushing here is probably wrong.
>> 
>> But the incoming file size in the GETATTR is definitely screwing up
>> the cached file size.
>> 
> 
> In which kernels are you seeing the race? For recent kernels (v4.0+)
> the write code should be calling nfs_fattr_set_barrier() in order to
> prevent the result from the ACCESS from overwriting the new file size.

I'm testing on 4.2-rc4.

--
Chuck Lever

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html