Moving the QP into error state right after with rdma_disconnect
you are not sure that none of the subset of the invalidations
that _were_ posted completed and you get the corresponding MRs
in a bogus state...
Moving the QP to error state and then draining the CQs means
that all LOCAL_INV WRs that managed to get posted will get
completed or flushed. That's already handled today.
It's the WRs that didn't get posted that I'm worried about
in this patch.
Are there RDMA consumers in the kernel that use that third
argument to recover when LOCAL_INV WRs cannot be posted?
None :)
I suppose I could reset these MRs instead (that is,
pass them to ib_dereg_mr).
Or, just wait for a completion for those that were posted
and then all the MRs are in a consistent state.
When a LOCAL_INV completes with IB_WC_SUCCESS, the associated
MR is in a known state (ie, invalid).
The WRs that flush mean the associated MRs are not in a known
state. Sometimes the MR state is different than the hardware
state, for example. Trying to do anything with one of these
inconsistent MRs results in IB_WC_BIND_MW_ERR until the thing
is deregistered.
Correct.
The xprtrdma completion handlers mark the MR associated with
a flushed LOCAL_INV WR "stale". They all have to be reset with
ib_dereg_mr to guarantee they are usable again. Have a look at
__frwr_recovery_worker().
Yes, I'm aware of that.
And, xprtrdma waits for only the last LOCAL_INV in the chain to
complete. If that one isn't posted, then fr_done is never woken
up. In that case, frwr_op_unmap_sync() would wait forever.
Ah.. so the (missing) completions is the problem, now I get
it.
If I understand you I think the correct solution is for
frwr_op_unmap_sync() to regroup and reset the MRs associated
with the LOCAL_INV WRs that were never posted, using the same
mechanism as __frwr_recovery_worker() .
Yea, I'd recycle all the MRs instead of having non-trivial logic
to try and figure out MR states...
It's already 4.5-rc7, a little late for a significant rework
of this patch, so maybe I should drop it?
Perhaps... Although you can make it incremental because the current
patch doesn't seem to break anything, just not solving the complete
problem...
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html