Re: not picking a delegation stateid for IO when delegation stateid is being returned

Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> · Wed, 3 Dec 2014 18:13:12 -0500

On Wed, Dec 3, 2014 at 5:59 PM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
> On Wed, Dec 3, 2014 at 3:59 PM, Trond Myklebust
> <trond.myklebust@xxxxxxxxxxxxxxx> wrote:
>> Hi Olga,
>>
>> On Wed, Dec 3, 2014 at 2:54 PM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
>>> Hi folks,
>>>
>>> I would like an opinion about changing code in such as way that we
>>> don't select a delegation stateid for an IO operation when this
>>> particular delegation is being returned.
>>>
>>> The reason it's some what problematic is that we send out a
>>> DELEG_RETURN and then we don't remove the stateid until we receive a
>>> reply. In the mean while, an IO operation can be happening and in
>>> nfs4_select_rw_stateid() it sees a delegation stateid and uses it.
>>> Well, at the server, it finishes processing DELEG_RETURN before
>>> getting an IO op and by that time the server is finished with the
>>> stateid and can error an IO operation with BAD_STATEID.
>>>
>>> diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
>>> index 7f3f606..4f6f6c9 100644
>>> --- a/fs/nfs/delegation.c
>>> +++ b/fs/nfs/delegation.c
>>> @@ -854,7 +854,8 @@ bool nfs4_copy_delegation_stateid(nfs4_stateid
>>> *dst, struct inode *in
>>>         flags &= FMODE_READ|FMODE_WRITE;
>>>         rcu_read_lock();
>>>         delegation = rcu_dereference(nfsi->delegation);
>>> -       ret = (delegation != NULL && (delegation->type & flags) == flags);
>>> +       ret = (delegation != NULL && (delegation->type & flags) == flags &&
>>> +               !test_bit(NFS_DELEGATION_RETURNING, &delegation->flags));
>>>         if (ret) {
>>>                 nfs4_stateid_copy(dst, &delegation->stateid);
>>>                 nfs_mark_delegation_referenced(delegation);
>>
>> The above cannot eliminate the possibility that we won't use a
>> delegation while it is being returned. It will at best just reduce the
>> window of opportunity.
>>
>
> You are right this are still problems. Actually, we might set the bit
> but not yet get the open stateid from the open with deleg_cur and
> that's not good. It would be good to know we got the open stateid and
> then pick that.
>
>> So, why is this being considered to be a problem in the first place?
>> Are you seeing a measurable performance impact on a real life workload
>> (as opposed to some 1-in-a-billion occurrence from a QA test :-))?
>
> Unfortunately, this problem is quite common and I hit it all the time
> on my setup. This leads to client seizing IO on that file and
> returning EIO. It's an unrecoverable error. I'm trying to figure out
> how to eliminate getting to that state.
>

It definitely isn't intended to be an irrecoverable error. The client
is supposed to just replay the write after updating the stateid.

-- 
Trond Myklebust

Linux NFS client maintainer, PrimaryData

trond.myklebust@xxxxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html