Re: not picking a delegation stateid for IO when delegation stateid is being returned

Olga Kornievskaia <aglo@xxxxxxxxx> · Wed, 3 Dec 2014 18:50:10 -0500

On Wed, Dec 3, 2014 at 6:34 PM, Trond Myklebust
<trond.myklebust@xxxxxxxxxxxxxxx> wrote:
>
> On Dec 3, 2014 6:21 PM, "Olga Kornievskaia" <aglo@xxxxxxxxx> wrote:
>>
>> On Wed, Dec 3, 2014 at 6:13 PM, Trond Myklebust
>> <trond.myklebust@xxxxxxxxxxxxxxx> wrote:
>> > On Wed, Dec 3, 2014 at 5:59 PM, Olga Kornievskaia <aglo@xxxxxxxxx>
>> > wrote:
>> >> On Wed, Dec 3, 2014 at 3:59 PM, Trond Myklebust
>> >> <trond.myklebust@xxxxxxxxxxxxxxx> wrote:
>> >>> Hi Olga,
>> >>>
>> >>> On Wed, Dec 3, 2014 at 2:54 PM, Olga Kornievskaia <aglo@xxxxxxxxx>
>> >>> wrote:
>> >>>> Hi folks,
>> >>>>
>> >>>> I would like an opinion about changing code in such as way that we
>> >>>> don't select a delegation stateid for an IO operation when this
>> >>>> particular delegation is being returned.
>> >>>>
>> >>>> The reason it's some what problematic is that we send out a
>> >>>> DELEG_RETURN and then we don't remove the stateid until we receive a
>> >>>> reply. In the mean while, an IO operation can be happening and in
>> >>>> nfs4_select_rw_stateid() it sees a delegation stateid and uses it.
>> >>>> Well, at the server, it finishes processing DELEG_RETURN before
>> >>>> getting an IO op and by that time the server is finished with the
>> >>>> stateid and can error an IO operation with BAD_STATEID.
>> >>>>
>> >>>> diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
>> >>>> index 7f3f606..4f6f6c9 100644
>> >>>> --- a/fs/nfs/delegation.c
>> >>>> +++ b/fs/nfs/delegation.c
>> >>>> @@ -854,7 +854,8 @@ bool nfs4_copy_delegation_stateid(nfs4_stateid
>> >>>> *dst, struct inode *in
>> >>>>         flags &= FMODE_READ|FMODE_WRITE;
>> >>>>         rcu_read_lock();
>> >>>>         delegation = rcu_dereference(nfsi->delegation);
>> >>>> -       ret = (delegation != NULL && (delegation->type & flags) ==
>> >>>> flags);
>> >>>> +       ret = (delegation != NULL && (delegation->type & flags) ==
>> >>>> flags &&
>> >>>> +               !test_bit(NFS_DELEGATION_RETURNING,
>> >>>> &delegation->flags));
>> >>>>         if (ret) {
>> >>>>                 nfs4_stateid_copy(dst, &delegation->stateid);
>> >>>>                 nfs_mark_delegation_referenced(delegation);
>> >>>
>> >>> The above cannot eliminate the possibility that we won't use a
>> >>> delegation while it is being returned. It will at best just reduce the
>> >>> window of opportunity.
>> >>>
>> >>
>> >> You are right this are still problems. Actually, we might set the bit
>> >> but not yet get the open stateid from the open with deleg_cur and
>> >> that's not good. It would be good to know we got the open stateid and
>> >> then pick that.
>> >>
>> >>> So, why is this being considered to be a problem in the first place?
>> >>> Are you seeing a measurable performance impact on a real life workload
>> >>> (as opposed to some 1-in-a-billion occurrence from a QA test :-))?
>> >>
>> >> Unfortunately, this problem is quite common and I hit it all the time
>> >> on my setup. This leads to client seizing IO on that file and
>> >> returning EIO. It's an unrecoverable error. I'm trying to figure out
>> >> how to eliminate getting to that state.
>> >>
>> >
>> > It definitely isn't intended to be an irrecoverable error. The client
>> > is supposed to just replay the write after updating the stateid.
>>
>> open(deleg_cur) call / reply
>> lock() call/reply
>> deleg_return() call
>> write(with deluge_stateid) call gets BAD_STATEID
>> state recovery code marks lock state lost -> EIO.
>
> Why is it marking the lock as lost? If the recovery succeeded, it should
> notice that the stateid has changed and instead retry.

I'll get you a better explanation tomorrow besides saying "that's what
I see when I run the code".

> What kernel is this?

This is upstream.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html