Re: RDMA/CM and multiple QPs

Parav Pandit <pandit.parav@xxxxxxxxx> · Thu, 10 Sep 2015 23:25:24 +0530

Sorry if you find that I am imposing, but there were not much inputs
on below thoughts in this email chain for abstraction, so iterating
again to see if there is different view now.

I understood the Christoph's requirement is relatively lean where
block-mq's MQ can be bound to CPU and/or to RDMA QP.
That session layer is probably is the right place, to attach the
connection(s) to a session.

Establishing multiple QP is just one part of it.
Bigger challenge is how do we distribute the work request among
multiple QPs specially when STAG advertisements, their invalidation is
agnostic at Verbs layer (which is not part of the IB spec and every
ULP has their own method possibly for good reason).

Few months back when I was working on this problem; solution we
considered is similar to what networking stack currently does.
As below:

1. instead of having pure ib_send, write, read verbs, invalidate, we
need to have more higher level verbs for data transport.
such send_data, receive_data, advertise data_buffers etc. Of course
keeping zero copy semantics in mind.

2. Perform device aggregation similar to Ethernet netdev link aggregation.
So two ib_device forms the pair on which one or more QPs will be created.
This virtual device provides higher level data transfer APIS than just
raw IB semantics.
By doing so, this layer decides how to advertise memory, when to
invalidate, which QP to use for transport (load balance or failover).

3. I have not thought through on how we can port existing ULPs whose
specification is IB driven to migrate on this newly defined interface.

4. Accelio is one such framework come close to this design philosophy,
however its current implementation brings resource overhead for MRs
and as we go along we have scope to optimize it.

5. Since this layer is located above raw IB verbs layer and above
RDMA-CM, core is untouched for the functionality. Once we have it many
of the migration related issue can be solved, where node can
disconnect and reconnect in stateful way.

6. This way pure hardware resource is detached from transport
acceleration, it gives flexibility to implement services which is
often difficult to do at raw IB verbs level.

On Thu, Sep 10, 2015 at 10:00 PM, Hefty, Sean <sean.hefty@xxxxxxxxx> wrote:
>> right now RDMA/CM works on a QP basis, but seems very awakward if you
>> want multiple QPs as part of a single logical device, which will be
>> useful for a lot of modern protocols.  For example we will need to check
>> in the CM handler that we're not getting a different ib_device if we
>> want to apply the device limit in any sort of global scope, and it's
>> generally very hard to get a struct ib_device that can be used as
>> a driver model parent.
>>
>> Is there any interest in trying to add an API to the CM to do a single
>> address resolution and allocate multiple QPs with these checks in
>> place?
>
> IMO, you want a completely different level of abstraction.  One not based on a specific hardware implementation.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html