Re: async CLOSE creates a race with a DELEGRETURN followed by a REMOVE

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Sep 3, 2015 at 6:37 PM, Trond Myklebust
<trond.myklebust@xxxxxxxxxxxxxxx> wrote:
>
> On Sep 3, 2015 18:08, "Olga Kornievskaia" <aglo@xxxxxxxxx> wrote:
>>
>> On Thu, Sep 3, 2015 at 3:34 PM, Chuck Lever <chuck.lever@xxxxxxxxxx>
>> wrote:
>> >
>> > On Sep 3, 2015, at 3:26 PM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
>> >
>> >> When file is opened with O_DIRECT, sending asynchronous CLOSE creates
>> >> a race condition between CLOSE and DELEGRETURN followed REMOVE. Next
>> >> operations are sent before the reply for CLOSE comes back which is
>> >> according to the spec is not allowed (section 10.4.4). Server can get
>> >> a REMOVE before the CLOSE and it causes EACCESS error to be returned
>> >> because server thinks there is open state left.
>> >>
>> >> According to the spec: "...whenever a client chooses to return a
>> >> delegation voluntarily.  The following items of state need to be dealt
>> >> with:
>> >>
>> >>   o  If the file associated with the delegation is no longer open and
>> >>      no previous CLOSE operation has been sent to the server, a CLOSE
>> >>      operation must be sent to the server."
>> >>
>> >> So i'm not sure if it will be argued that it doesn't say that a reply
>> >> must be received before sending the DELEGRETURN. However, if it's not
>> >> mandated then a race as I'm mentioning occurs.
>> >>
>> >> The following patch introduces making close async:
>> >> commit f895c53f8ace3c3e49ebf9def90e63fc6d46d2bf
>> >> Author: Chuck Lever <chuck.lever@xxxxxxxxxx>
>> >> Date:   Mon Feb 1 14:17:50 2010 -0500
>> >>
>> >>    NFS: Make close(2) asynchronous when closing NFS O_DIRECT files
>> >>
>> >>    For NFSv2 and v3:
>> >>
>> >>    O_DIRECT writes are always synchronous, and aren't cached, so
>> >> nothing
>> >>    should be flushed when closing an NFS O_DIRECT file descriptor.
>> >> Thus
>> >>    there are no write errors to report on close(2).
>> >>
>> >>    In addition, there's no cached data to verify on the next open(2),
>> >>    so we don't need clean GETATTR results at close time to compare
>> >> with.
>> >>
>> >>    Thus, there's no need for the nfs_revalidate_inode() call when
>> >> closing
>> >>    an NFS O_DIRECT file.  This reduces the number of synchronous
>> >>    on-the-wire requests for a simple open-write-close of an NFS
>> >> O_DIRECT
>> >>    file by roughly 20%.
>> >>
>> >>    For NFSv4:
>> >>
>> >>    Call nfs4_do_close() with wait set to zero when closing an NFS
>> >>    O_DIRECT file.  The CLOSE will go on the wire, but the application
>> >>    won't wait for it to complete.
>> >>
>> >> I'd like to suggest to revert the use of this patch.
>> >
>> > DELEGRETURN can be made to wait for the CLOSE reply.
>> >
>> > Or, maybe the client should return an offered delegation
>> > immediately when a file is open O_DIRECT. I don't see much
>> > use in keeping the delegation if the application wants
>> > GETATTR, READ and WRITE always flushed to the server
>> > immediately.
>>
>> Yes i think somehow a delegreturn should be made to wait for close.
>> Because this race also happens for other opens when close is sent
>> async but I haven't pinned pointed the case. It looks like async write
>> calls async close so perhaps that's it.
>>
>> Though how much are we saving by making CLOSE async vs making the code
>> more complicated?
>>
> I'd far prefer reverting the asynchronous close if it is causing problems.
> Serialisation is harder to get right. Particularly so when we need to do it
> from within the context of the state manager thread....
>

Agreed.

Also I'd like to point out that coordination with CLOSE needs to be
done not only with DELEGRETURN but with REMOVE as well. In the current
kernel, if open is done with O_DIRECT, then the code sets
WANT_NO_DELEGATION flag and server does not give one. However, the
close being asynchronous if followed by a REMOVE will cause EACCESS
error if CLOSE arrives after the REMOVE.

I have a test case and setup where I delay the CLOSE via proxy and I
have an open with O_DIRECT. I get no delegation and after I do close()
and remove(), the latter fails.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux