On 3/19/2020 1:56 PM, Jason Gunthorpe wrote:
On Thu, Mar 19, 2020 at 11:15:50AM +0200, Max Gurtovoy wrote:
On 3/19/2020 6:09 AM, Bart Van Assche wrote:
On 2020-03-18 08:02, Max Gurtovoy wrote:
In order to save resource allocation and utilize the completion
^^^^^^^^^^^^^^^^^^^
resources?
thanks.
+static int nvmet_rdma_srq_size = 1024;
+module_param_cb(srq_size, &srq_size_ops, &nvmet_rdma_srq_size, 0644);
+MODULE_PARM_DESC(srq_size, "set Shared Receive Queue (SRQ) size, should >= 256 (default: 1024)");
Is an SRQ overflow fatal? Isn't the SRQ size something that should be
computed by the nvmet_rdma driver such that SRQ overflows do not happen?
I've added the following code to make sure that the size is not greater than
device capability:
+ndev->srq_size = min(ndev->device->attrs.max_srq_wr,
+ nvmet_rdma_srq_size);
In case the SRQ doesn't have enough credits it will send rnr to the
initiator and the initiator will retry later on.
This is a pretty big change, in bad cases we could significantly
overflow the srq space available...
A big part of most verbs protocols to ensure that the RQ does not
overflow.
Are we sure it is OK? With all initiator/targets out there?
IMO if we set the srq size to utilize the wire so we're good.
So the best we can do is decrease the rnr timer to re-transmit faster -
I've patches for that as well.
Let me know if you prefer I'll sent it in v3.
Nevertheless, this situation is better from the current SRQ per HCA
implementation.
Jason