Re: [PATCH v3 05/11] xprtrdma: Do not wait if ib_post_send() fails

Chuck Lever <chuck.lever@xxxxxxxxxx> · Thu, 10 Mar 2016 10:58:46 -0500

> On Mar 10, 2016, at 10:54 AM, Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx> wrote:
> 
>>>>>>>> Moving the QP into error state right after with rdma_disconnect
>>>>>>>> you are not sure that none of the subset of the invalidations
>>>>>>>> that _were_ posted completed and you get the corresponding MRs
>>>>>>>> in a bogus state...
>>>>>>> 
>>>>>>> Moving the QP to error state and then draining the CQs means
>>>>>>> that all LOCAL_INV WRs that managed to get posted will get
>>>>>>> completed or flushed. That's already handled today.
>>>>>>> 
>>>>>>> It's the WRs that didn't get posted that I'm worried about
>>>>>>> in this patch.
>>>>>>> 
>>>>>>> Are there RDMA consumers in the kernel that use that third
>>>>>>> argument to recover when LOCAL_INV WRs cannot be posted?
>>>>>> 
>>>>>> None :)
>>>>>> 
>>>>>>>>> I suppose I could reset these MRs instead (that is,
>>>>>>>>> pass them to ib_dereg_mr).
>>>>>>>> 
>>>>>>>> Or, just wait for a completion for those that were posted
>>>>>>>> and then all the MRs are in a consistent state.
>>>>>>> 
>>>>>>> When a LOCAL_INV completes with IB_WC_SUCCESS, the associated
>>>>>>> MR is in a known state (ie, invalid).
>>>>>>> 
>>>>>>> The WRs that flush mean the associated MRs are not in a known
>>>>>>> state. Sometimes the MR state is different than the hardware
>>>>>>> state, for example. Trying to do anything with one of these
>>>>>>> inconsistent MRs results in IB_WC_BIND_MW_ERR until the thing
>>>>>>> is deregistered.
>>>>>> 
>>>>>> Correct.
>>>>>> 
>>>>> 
>>>>> It is legal to invalidate an MR that is not in the valid state.  So you
>>> don't
>>>>> have to deregister it, you can assume it is valid and post another LINV
> WR.
>>>> 
>>>> I've tried that. Once the MR is inconsistent, even LOCAL_INV
>>>> does not work.
>>>> 
>>> 
>>> Maybe IB Verbs don't mandate that invalidating an invalid MR must be
> allowed?
>>> (looking at the verbs spec now).
>> 
> 
> IB Verbs doesn't have specify this requirement.  iW verbs does.  So transport
> independent applications cannot rely on it.  So ib_dereg_mr() seems to be the
> only thing you can do.
> 
>> If the MR is truly invalid, then there is no issue, and
>> the second LOCAL_INV completes successfully.
>> 
>> The problem is after a flushed LOCAL_INV, the MR state
>> sometimes does not match the hardware state. The MR is
>> neither registered or invalid.
>> 
> 
> There is a difference, at least with iWARP devices, between the MR state: VALID
> vs INVALID, and if the MR is allocated or not.
> 
>> A flushed LOCAL_INV tells you nothing more than that the
>> LOCAL_INV didn't complete. The MR state at that point is
>> unknown.
>> 
> 
> With respect to iWARP and cxgb4: when you allocate a fastreg MR, HW has an entry
> for that MR and it is marked "allocated".  The MR record in HW also has a state:
> VALID or INVALID.  While the MR is "allocated" you can post WRs to invalidate it
> which changes the state to INVALID, or fast-register memory which makes it
> VALID.  Regardless of what happens on any given QP, the MR remains "allocated"
> until you call ib_dereg_mr().  So at least for cxgb4, you could in fact just
> post another LINV to get it back to a known state that allows subsequent
> fast-reg WRs.
> 
> Perhaps IB devices don't work this way.
> 
> What error did you get when you tried just doing an LINV after a flush?

With CX-2 and CX-3, after a flushed LOCAL_INV, trying either
a FASTREG or LOCAL_INV on that MR can sometimes complete with
IB_WC_MW_BIND_ERR.

--
Chuck Lever

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html