Re: [PATCH v4 00/25] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)

Danil Kipnis <danil.kipnis@xxxxxxxxxxxxxxx> · Fri, 19 Jul 2019 15:12:46 +0200

Hi Sagi,

thanks a lot for the information. We are doing the right thing
regarding the invalidation (your 2f122e4f5107), but we do use
unsignalled sends and need to fix that. Please correct me if I'm
wrong: The patches (b4b591c87f2b, b4b591c87f2b) fix the problem that
if the ack from target is lost for some reason, the initiators HCA
will resend it even after the request is completed.
But doesn't the same problem persist also other way around: for the
lost acks from client? I mean, target is did a send for the "read"
IOs; client completed the request (after invalidation, refcount
dropped to 0, etc), but the ack is not delivered to the HCA of the
target, so the target will also resend it. This seems unfixable, since
the client can't possible know if the server received his ack or not?
Doesn't the problem go away, if rdma_conn_param.retry_count is just set to 0?

Thanks for your help,
Best,
Danil.

On Tue, Jul 9, 2019 at 11:27 PM Sagi Grimberg <sagi@xxxxxxxxxxx> wrote:
>
>
> >> Thanks Jason for feedback.
> >> Can you be  more specific about  "the invalidation model for MR was wrong"
> >
> > MR's must be invalidated before data is handed over to the block
> > layer. It can't leave MRs open for access and then touch the memory
> > the MR covers.
>
> Jason is referring to these fixes:
> 2f122e4f5107 ("nvme-rdma: wait for local invalidation before completing
> a request")
> 4af7f7ff92a4 ("nvme-rdma: don't complete requests before a send work
> request has completed")
> b4b591c87f2b ("nvme-rdma: don't suppress send completions")