Re: [PATCH v2 3/5] nvmet-rdma: use SRQ per completion vector

Max Gurtovoy <maxg@xxxxxxxxxxxx> · Thu, 19 Mar 2020 14:48:20 +0200

On 3/19/2020 1:56 PM, Jason Gunthorpe wrote:
On Thu, Mar 19, 2020 at 11:15:50AM +0200, Max Gurtovoy wrote:
On 3/19/2020 6:09 AM, Bart Van Assche wrote:
On 2020-03-18 08:02, Max Gurtovoy wrote:
In order to save resource allocation and utilize the completion
                     ^^^^^^^^^^^^^^^^^^^
                     resources?
thanks.

+static int nvmet_rdma_srq_size = 1024;
+module_param_cb(srq_size, &srq_size_ops, &nvmet_rdma_srq_size, 0644);
+MODULE_PARM_DESC(srq_size, "set Shared Receive Queue (SRQ) size, should >= 256 (default: 1024)");
Is an SRQ overflow fatal? Isn't the SRQ size something that should be
computed by the nvmet_rdma driver such that SRQ overflows do not happen?
I've added the following code to make sure that the size is not greater than
device capability:

+ndev->srq_size = min(ndev->device->attrs.max_srq_wr,
+                            nvmet_rdma_srq_size);

In case the SRQ doesn't have enough credits it will send rnr to the
initiator and the initiator will retry later on.
This is a pretty big change, in bad cases we could significantly
overflow the srq space available...

A big part of most verbs protocols to ensure that the RQ does not
overflow.

Are we sure it is OK? With all initiator/targets out there?

IMO if we set the srq size to utilize the wire so we're good.

So the best we can do is decrease the rnr timer to re-transmit faster - 
I've patches for that as well.

Let me know if you prefer I'll sent it in v3.

Nevertheless, this situation is better from the current SRQ per HCA 
implementation.

Jason