On 7/10/2017 11:57 PM, Chuck Lever wrote:
On Jul 10, 2017, at 6:09 PM, Jason Gunthorpe <jgunthorpe@xxxxxxxxxxxxxxxxxxxx> wrote:
On Mon, Jul 10, 2017 at 06:04:18PM -0400, Chuck Lever wrote:
The server sounds fine, how does the client work?
The client does not initiate RDMA Read or Write today.
Right, but it provides an rkey that the server uses for READ or WRITE.
The invalidate of that rkey at the client must follow the same rules
as inline send.
Ah, I see.
The RPC reply handler calls frwr_op_unmap_sync to invalidate
any MRs associated with the RPC.
frwr_op_unmap_sync has to sort the rkeys that are remotely
invalidated, and those that have not been.
Does the reply handler consider the possibility that the reply is
being signaled before the send WRs? There are some really interesting
races on shared or multiple CQs when the completion upcalls start
to back up under heavy load that we've seen in Windows SMB Direct.
In the end, we had to put explicit reference counts on each and
every object, and added rundown references to everything before
completing an operation and signaling the upper layer (SMB3, in
our case). This found a surprising number of double completions,
and missing completions from drivers as well.
The first step is to ensure all the rkeys for an RPC are
invalid. The rkey that was remotely invalidated is skipped
here, and a chain of LocalInv WRs is posted to invalidate
any remaining rkeys. The last WR in the chain is signaled.
If one or more LocalInv WRs are posted, this function waits
for LocalInv completion.
The last step is always DMA unmapping. Note that we can't
get a completion for a remotely invalidated rkey, and we
have to wait for LocalInv to complete anyway. So the DMA
unmapping is always handled here instead of in a
completion handler.
When frwr_op_unmap_sync returns to the RPC reply handler,
the handler calls xprt_complete_rqst, and the RPC is
terminated. This guarantees that the MRs are invalid before
control is returned to the RPC consumer.
In the ^C case, frwr_op_unmap_safe is invoked during RPC
termination. The MRs are passed to the background recovery
task, which invokes frwr_op_recover_mr.
That worries me. How do you know it's going in sequence, and
that it will result in an invalidated MR?
frwr_op_recover_mr destroys the fr_mr and DMA unmaps the
memory. (It's also used when registration or invalidation
flushes, which is why it uses a hammer).
So here, we're a little fast/loose: the ordering of
invalidation and unmapping is correct, but the MRs can be
invalidated after the RPC completes. Since RPC termination
can't wait, this is the best I can do for now.
That would worry me even more. "fast/loose" isn't a good
situation when storage is concerned. Shouldn't you just be
closing the connection?
Tom.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html