> >>>>>> Moving the QP into error state right after with rdma_disconnect > >>>>>> you are not sure that none of the subset of the invalidations > >>>>>> that _were_ posted completed and you get the corresponding MRs > >>>>>> in a bogus state... > >>>>> > >>>>> Moving the QP to error state and then draining the CQs means > >>>>> that all LOCAL_INV WRs that managed to get posted will get > >>>>> completed or flushed. That's already handled today. > >>>>> > >>>>> It's the WRs that didn't get posted that I'm worried about > >>>>> in this patch. > >>>>> > >>>>> Are there RDMA consumers in the kernel that use that third > >>>>> argument to recover when LOCAL_INV WRs cannot be posted? > >>>> > >>>> None :) > >>>> > >>>>>>> I suppose I could reset these MRs instead (that is, > >>>>>>> pass them to ib_dereg_mr). > >>>>>> > >>>>>> Or, just wait for a completion for those that were posted > >>>>>> and then all the MRs are in a consistent state. > >>>>> > >>>>> When a LOCAL_INV completes with IB_WC_SUCCESS, the associated > >>>>> MR is in a known state (ie, invalid). > >>>>> > >>>>> The WRs that flush mean the associated MRs are not in a known > >>>>> state. Sometimes the MR state is different than the hardware > >>>>> state, for example. Trying to do anything with one of these > >>>>> inconsistent MRs results in IB_WC_BIND_MW_ERR until the thing > >>>>> is deregistered. > >>>> > >>>> Correct. > >>>> > >>> > >>> It is legal to invalidate an MR that is not in the valid state. So you > > don't > >>> have to deregister it, you can assume it is valid and post another LINV WR. > >> > >> I've tried that. Once the MR is inconsistent, even LOCAL_INV > >> does not work. > >> > > > > Maybe IB Verbs don't mandate that invalidating an invalid MR must be allowed? > > (looking at the verbs spec now). > IB Verbs doesn't have specify this requirement. iW verbs does. So transport independent applications cannot rely on it. So ib_dereg_mr() seems to be the only thing you can do. > If the MR is truly invalid, then there is no issue, and > the second LOCAL_INV completes successfully. > > The problem is after a flushed LOCAL_INV, the MR state > sometimes does not match the hardware state. The MR is > neither registered or invalid. > There is a difference, at least with iWARP devices, between the MR state: VALID vs INVALID, and if the MR is allocated or not. > A flushed LOCAL_INV tells you nothing more than that the > LOCAL_INV didn't complete. The MR state at that point is > unknown. > With respect to iWARP and cxgb4: when you allocate a fastreg MR, HW has an entry for that MR and it is marked "allocated". The MR record in HW also has a state: VALID or INVALID. While the MR is "allocated" you can post WRs to invalidate it which changes the state to INVALID, or fast-register memory which makes it VALID. Regardless of what happens on any given QP, the MR remains "allocated" until you call ib_dereg_mr(). So at least for cxgb4, you could in fact just post another LINV to get it back to a known state that allows subsequent fast-reg WRs. Perhaps IB devices don't work this way. What error did you get when you tried just doing an LINV after a flush? Steve. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html