Re: async CLOSE creates a race with a DELEGRETURN followed by a REMOVE

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sep 4, 2015, at 2:09 PM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:

> On Thu, Sep 3, 2015 at 8:31 PM, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
>> 
>> On Sep 3, 2015, at 8:18 PM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
>> 
>>> On Thu, Sep 3, 2015 at 6:37 PM, Trond Myklebust
>>> <trond.myklebust@xxxxxxxxxxxxxxx> wrote:
>>>> 
>>>> On Sep 3, 2015 18:08, "Olga Kornievskaia" <aglo@xxxxxxxxx> wrote:
>>>>> 
>>>>> On Thu, Sep 3, 2015 at 3:34 PM, Chuck Lever <chuck.lever@xxxxxxxxxx>
>>>>> wrote:
>>>>>> 
>>>>>> On Sep 3, 2015, at 3:26 PM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
>>>>>> 
>>>>>>> When file is opened with O_DIRECT, sending asynchronous CLOSE creates
>>>>>>> a race condition between CLOSE and DELEGRETURN followed REMOVE. Next
>>>>>>> operations are sent before the reply for CLOSE comes back which is
>>>>>>> according to the spec is not allowed (section 10.4.4). Server can get
>>>>>>> a REMOVE before the CLOSE and it causes EACCESS error to be returned
>>>>>>> because server thinks there is open state left.
>>>>>>> 
>>>>>>> According to the spec: "...whenever a client chooses to return a
>>>>>>> delegation voluntarily.  The following items of state need to be dealt
>>>>>>> with:
>>>>>>> 
>>>>>>> o  If the file associated with the delegation is no longer open and
>>>>>>>    no previous CLOSE operation has been sent to the server, a CLOSE
>>>>>>>    operation must be sent to the server."
>>>>>>> 
>>>>>>> So i'm not sure if it will be argued that it doesn't say that a reply
>>>>>>> must be received before sending the DELEGRETURN. However, if it's not
>>>>>>> mandated then a race as I'm mentioning occurs.
>>>>>>> 
>>>>>>> The following patch introduces making close async:
>>>>>>> commit f895c53f8ace3c3e49ebf9def90e63fc6d46d2bf
>>>>>>> Author: Chuck Lever <chuck.lever@xxxxxxxxxx>
>>>>>>> Date:   Mon Feb 1 14:17:50 2010 -0500
>>>>>>> 
>>>>>>>  NFS: Make close(2) asynchronous when closing NFS O_DIRECT files
>>>>>>> 
>>>>>>>  For NFSv2 and v3:
>>>>>>> 
>>>>>>>  O_DIRECT writes are always synchronous, and aren't cached, so
>>>>>>> nothing
>>>>>>>  should be flushed when closing an NFS O_DIRECT file descriptor.
>>>>>>> Thus
>>>>>>>  there are no write errors to report on close(2).
>>>>>>> 
>>>>>>>  In addition, there's no cached data to verify on the next open(2),
>>>>>>>  so we don't need clean GETATTR results at close time to compare
>>>>>>> with.
>>>>>>> 
>>>>>>>  Thus, there's no need for the nfs_revalidate_inode() call when
>>>>>>> closing
>>>>>>>  an NFS O_DIRECT file.  This reduces the number of synchronous
>>>>>>>  on-the-wire requests for a simple open-write-close of an NFS
>>>>>>> O_DIRECT
>>>>>>>  file by roughly 20%.
>>>>>>> 
>>>>>>>  For NFSv4:
>>>>>>> 
>>>>>>>  Call nfs4_do_close() with wait set to zero when closing an NFS
>>>>>>>  O_DIRECT file.  The CLOSE will go on the wire, but the application
>>>>>>>  won't wait for it to complete.
>>>>>>> 
>>>>>>> I'd like to suggest to revert the use of this patch.
>>>>>> 
>>>>>> DELEGRETURN can be made to wait for the CLOSE reply.
>>>>>> 
>>>>>> Or, maybe the client should return an offered delegation
>>>>>> immediately when a file is open O_DIRECT. I don't see much
>>>>>> use in keeping the delegation if the application wants
>>>>>> GETATTR, READ and WRITE always flushed to the server
>>>>>> immediately.
>>>>> 
>>>>> Yes i think somehow a delegreturn should be made to wait for close.
>>>>> Because this race also happens for other opens when close is sent
>>>>> async but I haven't pinned pointed the case. It looks like async write
>>>>> calls async close so perhaps that's it.
>>>>> 
>>>>> Though how much are we saving by making CLOSE async vs making the code
>>>>> more complicated?
>>>>> 
>>>> I'd far prefer reverting the asynchronous close if it is causing problems.
>>>> Serialisation is harder to get right. Particularly so when we need to do it
>>>> from within the context of the state manager thread....
>>>> 
>>> 
>>> Agreed.
>>> 
>>> Also I'd like to point out that coordination with CLOSE needs to be
>>> done not only with DELEGRETURN but with REMOVE as well. In the current
>>> kernel, if open is done with O_DIRECT, then the code sets
>>> WANT_NO_DELEGATION flag and server does not give one. However, the
>>> close being asynchronous if followed by a REMOVE will cause EACCESS
>>> error if CLOSE arrives after the REMOVE.
>> 
>> NFS4ERR_ACCESS is allowed for REMOVE, but why would a server
>> return ACCESS in this case?
> 
> Because client has open state left.

NFS4ERR_ACCESS does not mean "there is open state left." I think
the correct status should be NFS4ERR_FILE_OPEN, but it's optional
whether a server can remove an open file. See RFC 7530 Section
13.1.4.5.

ACCESS is plausible if the client had opened the file with a
deny mode, but I didn't think Linux did that.


>>> I have a test case and setup where I delay the CLOSE via proxy and I
>>> have an open with O_DIRECT. I get no delegation and after I do close()
>>> and remove(), the latter fails.
>> 
>> Can this be fixed without causing a performance regression
>> for NFSv3, which is not afflicted by any of these ordering
>> requirements?
> 
> Since there is no CLOSE in NFSv3 why would it cause performance regression?

NFSv3 does a synchronous GETATTR during close(2). The commit
you want to revert, in addition to altering NFSv4 CLOSE
behavior, eliminates that synchronous GETATTR when the file
is open with O_DIRECT.

Thus, if you revert that commit, it will result in a performance
regression for NFSv3.


--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux