RE: [PATCH rdma-next 0/3] Support out of order data placement

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Tom,

> -----Original Message-----
> From: Tom Talpey [mailto:tom@xxxxxxxxxx]
> Sent: Monday, June 12, 2017 7:12 PM
> To: Parav Pandit <parav@xxxxxxxxxxxx>; Jason Gunthorpe
> <jgunthorpe@xxxxxxxxxxxxxxxxxxxx>
> Cc: Bart Van Assche <Bart.VanAssche@xxxxxxxxxxx>; leon@xxxxxxxxxx;
> dledford@xxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx; Idan Burstein
> <idanb@xxxxxxxxxxxx>
> Subject: Re: [PATCH rdma-next 0/3] Support out of order data placement
> 
> On 6/12/2017 7:59 PM, Parav Pandit wrote:
> >> -----Original Message-----
> >> From: Tom Talpey [mailto:tom@xxxxxxxxxx]
> >> Sent: Monday, June 12, 2017 6:44 PM
> >> To: Parav Pandit <parav@xxxxxxxxxxxx>; Jason Gunthorpe
> >> <jgunthorpe@xxxxxxxxxxxxxxxxxxxx>
> >> Cc: Bart Van Assche <Bart.VanAssche@xxxxxxxxxxx>; leon@xxxxxxxxxx;
> >> dledford@xxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx; Idan Burstein
> >> <idanb@xxxxxxxxxxxx>
> >> Subject: Re: [PATCH rdma-next 0/3] Support out of order data
> >> placement
> >>
> >> On 6/12/2017 6:54 PM, Parav Pandit wrote:
> >>> Hi Tom,
> >>>
> >>>> -----Original Message-----
> >>>> From: Tom Talpey [mailto:tom@xxxxxxxxxx]
> >>>> Sent: Monday, June 12, 2017 5:20 PM
> >>>> To: Parav Pandit <parav@xxxxxxxxxxxx>; Jason Gunthorpe
> >>>> <jgunthorpe@xxxxxxxxxxxxxxxxxxxx>
> >>>> Cc: Bart Van Assche <Bart.VanAssche@xxxxxxxxxxx>;
> leon@xxxxxxxxxx;
> >>>> dledford@xxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx; Idan Burstein
> >>>> <idanb@xxxxxxxxxxxx>
> >>>> Subject: Re: [PATCH rdma-next 0/3] Support out of order data
> >>>> placement
> >>>>
> >>>> On 6/12/2017 5:32 PM, Parav Pandit wrote:
> >>>>> Hi Tom,
> >>>> ...
> >>>>>>
> >>>>>> I agree with Jason, the bit should be 1 by default, if defined as
> >>>>>> you
> >>>> propose.
> >>>>>> Out-of-order is the norm, not the exception, for ULPs.
> >>>>>> Honestly, I think you should perhaps consider making it the
> >>>>>> default on your devices, and allowing only MLX-aware ULPs to turn
> it off.
> >>>>>>
> >>>>>
> >>>>> There can be cases in deployment where responder has support for
> >>>> receiving out-of-order, but requester doesn't.
> >>>>
> >>>> Yuck! So this needs to be negotiated end-to-end, and by the upper
> layer?
> >>>> Talk about barriers to adoption, and opportunities for disaster.
> >>>>
> >>> As Jason confirmed that all Linux kernel consumers are coded to be
> >>> compliant to o9-20 requirement, So I think kernel based rdma-cm
> >> consumers can be transparently enabled end-to-end without ULP's
> >> involvement with rdma_accept() and rdma_connect().
> >>
> >> I have two thoughts here.
> >>
> >> 1) You seem to assume all consumers are Linux, and do not need to
> >> negotiate. This is a dangerous assumption.
> > Certainly not. I didn't assume that. I just gave one example that known
> consumers can be done without modifying the ULP.
> > Explained further in 3rd question.
> > Even other consumers can work with this solution.
> > For example Linux rdmacm based client and Other OS based server.
> > Client is ooo capable.
> > Server is ooo not capable.
> > Once you follow below rdmacm based sequence, it will be clear how this
> will works.
> 
> Oh, so there's a MAD protocol change under the hood. 
No. There is no change under the hood.
Your question was how can we avoid ULP change and still they can benefit of this feature?
So I said rdmacm based Linux kernel consumers that we know of comply to o9-20, can take the benefit once rdmacm is extended as below example.

> Well, that's a wider
> question. And I still don't understand how existing, non-strict-requiring
> protocols can take advantage of this.
> Nor how this works for non-Mellanox, non-IB/RoCE implementations.

Device capability indicates that which device supports this. Explained in Documentation/out_of_order.txt usage section.
So whichever vendor supports it, whichever protocol supports it, can set this optional device capability.

> 
> Again, I'd be a lot less concerned if non-strict were the default, and strict
> mode was negotiated. It's all just so upside-down.

In IB spec, in-order delivery is default. So can you suggest how can we change default IB behavior without breaking anything?
Adding optional attribute seems the right way that ensures compatibility.
> 
> Tom.
> 
> >> 2) I assume that there is some performance benefit to toggling this
> >> setting to non-strict. So, how do existing consumers get this
> >> advantage, especially since they don't need strict semantics? Bearing
> >> in mind that they do have to negotiate this end-to-end, meaning they
> require a protocol extension.
> > I don't have completely transparent upstream solution for existing
> consumers yet.
> >>
> >> Actually. I have a third thought. Since this is an attribute to qp
> >> creation, performed even before establishing a connection, how does
> >> the upper layer know when to set it?
> > This is not at QP creation time. I have described in
> Documentation/out_of_order.txt in usage section 3.
> > This is at QP state transition from INIT to RTR.
> > Here is the flow. It's just not coded enough for posting patches.
> >
> > 1. When rdmacm active side creates the QP, It is INIT state.
> > 2. Send MAD_Req msg (indicating ooo_requested=1) 3. When rdmacm
> > passive side receives the message, it looks up device_cap attribute and
> matches it against ooo_requested flag.
> > 4. when device supports it, MAD_Rsp msg sets ooo_enabled=1, if it
> > doesn't support it, ooo_enabled=0 5. rdmacm passive side creates the QP
> and moves to RTR state (with QP ooo enabled bit set).
> > 6. active side receives the message and puts the QP to RTR, RTS state
> based on received bit setting from passive side.
> >
> > Flow is no different than how rest of the connection specific parameters
> are shared such as IRD/ORD, PSN, timeouts, mtu etc.
> >
> >
> >
> >>
> >> Tom.
> > N     r  y   b X  ǧv ^ )޺{.n +    {  ٚ {ay ʇڙ ,j   f   h   z  w
> 
>    j:+v   w j m         zZ+     ݢj"  !tml=
> >
��.n��������+%������w��{.n�����{���fk��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux