Re: [PATCH v2 3/5] nvmet-rdma: use SRQ per completion vector

Max Gurtovoy <maxg@xxxxxxxxxxxx> · Thu, 19 Mar 2020 18:27:38 +0200

and again - hopefully last time I'm blocked...

On 3/19/2020 5:44 PM, Max Gurtovoy wrote:

sending again - I think the previous mail was blocked.

On 3/19/2020 5:10 PM, Max Gurtovoy wrote:

On 3/19/2020 4:49 PM, Bart Van Assche wrote:
On 3/19/20 6:53 AM, Jason Gunthorpe wrote:
On Thu, Mar 19, 2020 at 02:48:20PM +0200, Max Gurtovoy wrote:

Nevertheless, this situation is better from the current SRQ per HCA
implementation.

nvme/srp/etc already use srq? I see it in the target but not 
initiator?

Just worried about breaking some weird target somewhere

Jason,

The feature is only for target side and has no influence on initiator 
an the ULP level.

The only thing that I did is fixed the SRQ implementation for 
nvme/rdma and srpt that allocate 1 SRQ per device.

Now there are N SRQs per device that try to act as pure MQ 
implementation (without SRQ).

From the upstream SRP target driver:

static void srpt_get_ioc(struct srpt_port *sport, u32 slot,
             struct ib_dm_mad *mad)
{
    [ ... ]
    if (sdev->use_srq)
        send_queue_depth = sdev->srq_size;
    else
        send_queue_depth = min(MAX_SRPT_RQ_SIZE,
                       sdev->device->attrs.max_qp_wr);
    [ ... ]
    iocp->send_queue_depth = cpu_to_be16(send_queue_depth);
    [ ... ]
}

I'm not sure the SRP initiator uses that data from the device 
management
I/O controller profile.

Anyway, with one SRQ per initiator it is possible for the initiator to
prevent SRQ overflows. I don't think that it is possible for an 
initiator
to prevent target side SRQ overflows if shared receive queues are 
shared
across multiple initiators.

I don't to change initiator code and prevent overflow.

As I explained earlier, the SRQs in the target side will be assigned 
for all controllers for specific device (instead of global 1 per 
device) and share the receive buffers.

Not per initiator. This will cause lock contention.

In case the target SRQ has no resources in the specific time, the low 
level (RC qp) is responsible to send rnr to the initiator and the 
initiator (RC qp) will retry in the transport layer and not in the ULP.

This is set by min_rnr_timer value that by default set to 0 (max 
value).For SRQ case in general, IMO better to set it to 1 (minimal 
value) to avoid longer latency since there is a chance that SRQ is full.

In my testing I didn't see a real need to set the min_rnr_timer but I 
have patches for that in case Jason thinks that this should be part 
of this series that is not so small without it.

Thanks,

Bart.