RE: [PATCH rdma-next 0/3] Support out of order data placement

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Tom, Jason,

I will get back on updated v1 documentation and answers to below questions once I get some more details internally.

Parav

> -----Original Message-----
> From: Tom Talpey [mailto:tom@xxxxxxxxxx]
> Sent: Monday, June 12, 2017 8:30 PM
> To: Parav Pandit <parav@xxxxxxxxxxxx>; Jason Gunthorpe
> <jgunthorpe@xxxxxxxxxxxxxxxxxxxx>
> Cc: Bart Van Assche <Bart.VanAssche@xxxxxxxxxxx>; leon@xxxxxxxxxx;
> dledford@xxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx; Idan Burstein
> <idanb@xxxxxxxxxxxx>
> Subject: Re: [PATCH rdma-next 0/3] Support out of order data placement
> 
> On 6/12/2017 8:36 PM, Parav Pandit wrote:
> >> -----Original Message-----
> >> From: Tom Talpey [mailto:tom@xxxxxxxxxx]
> >> Sent: Monday, June 12, 2017 7:12 PM
> >> To: Parav Pandit <parav@xxxxxxxxxxxx>; Jason Gunthorpe
> >> <jgunthorpe@xxxxxxxxxxxxxxxxxxxx>
> >> Cc: Bart Van Assche <Bart.VanAssche@xxxxxxxxxxx>; leon@xxxxxxxxxx;
> >> dledford@xxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx; Idan Burstein
> >> <idanb@xxxxxxxxxxxx>
> >> Subject: Re: [PATCH rdma-next 0/3] Support out of order data
> >> placement
> >>
> >> On 6/12/2017 7:59 PM, Parav Pandit wrote:
> >>>> -----Original Message-----
> >>>> From: Tom Talpey [mailto:tom@xxxxxxxxxx]
> >>>> Sent: Monday, June 12, 2017 6:44 PM
> >>>> To: Parav Pandit <parav@xxxxxxxxxxxx>; Jason Gunthorpe
> >>>> <jgunthorpe@xxxxxxxxxxxxxxxxxxxx>
> >>>> Cc: Bart Van Assche <Bart.VanAssche@xxxxxxxxxxx>; leon@xxxxxxxxxx;
> >>>> dledford@xxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx; Idan Burstein
> >>>> <idanb@xxxxxxxxxxxx>
> >>>> Subject: Re: [PATCH rdma-next 0/3] Support out of order data
> >>>> placement
> >>>>
> >>>> On 6/12/2017 6:54 PM, Parav Pandit wrote:
> >>>>> Hi Tom,
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Tom Talpey [mailto:tom@xxxxxxxxxx]
> >>>>>> Sent: Monday, June 12, 2017 5:20 PM
> >>>>>> To: Parav Pandit <parav@xxxxxxxxxxxx>; Jason Gunthorpe
> >>>>>> <jgunthorpe@xxxxxxxxxxxxxxxxxxxx>
> >>>>>> Cc: Bart Van Assche <Bart.VanAssche@xxxxxxxxxxx>;
> >> leon@xxxxxxxxxx;
> >>>>>> dledford@xxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx; Idan Burstein
> >>>>>> <idanb@xxxxxxxxxxxx>
> >>>>>> Subject: Re: [PATCH rdma-next 0/3] Support out of order data
> >>>>>> placement
> >>>>>>
> >>>>>> On 6/12/2017 5:32 PM, Parav Pandit wrote:
> >>>>>>> Hi Tom,
> >>>>>> ...
> >>>>>>>>
> >>>>>>>> I agree with Jason, the bit should be 1 by default, if defined
> >>>>>>>> as you
> >>>>>> propose.
> >>>>>>>> Out-of-order is the norm, not the exception, for ULPs.
> >>>>>>>> Honestly, I think you should perhaps consider making it the
> >>>>>>>> default on your devices, and allowing only MLX-aware ULPs to
> >>>>>>>> turn
> >> it off.
> >>>>>>>>
> >>>>>>>
> >>>>>>> There can be cases in deployment where responder has support for
> >>>>>> receiving out-of-order, but requester doesn't.
> >>>>>>
> >>>>>> Yuck! So this needs to be negotiated end-to-end, and by the upper
> >> layer?
> >>>>>> Talk about barriers to adoption, and opportunities for disaster.
> >>>>>>
> >>>>> As Jason confirmed that all Linux kernel consumers are coded to be
> >>>>> compliant to o9-20 requirement, So I think kernel based rdma-cm
> >>>> consumers can be transparently enabled end-to-end without ULP's
> >>>> involvement with rdma_accept() and rdma_connect().
> >>>>
> >>>> I have two thoughts here.
> >>>>
> >>>> 1) You seem to assume all consumers are Linux, and do not need to
> >>>> negotiate. This is a dangerous assumption.
> >>> Certainly not. I didn't assume that. I just gave one example that
> >>> known
> >> consumers can be done without modifying the ULP.
> >>> Explained further in 3rd question.
> >>> Even other consumers can work with this solution.
> >>> For example Linux rdmacm based client and Other OS based server.
> >>> Client is ooo capable.
> >>> Server is ooo not capable.
> >>> Once you follow below rdmacm based sequence, it will be clear how
> >>> this
> >> will works.
> >>
> >> Oh, so there's a MAD protocol change under the hood.
> > No. There is no change under the hood.
> > Your question was how can we avoid ULP change and still they can benefit
> of this feature?
> > So I said rdmacm based Linux kernel consumers that we know of comply to
> o9-20, can take the benefit once rdmacm is extended as below example.
> >
> >> Well, that's a wider
> >> question. And I still don't understand how existing,
> >> non-strict-requiring protocols can take advantage of this.
> >> Nor how this works for non-Mellanox, non-IB/RoCE implementations.
> >
> > Device capability indicates that which device supports this. Explained in
> Documentation/out_of_order.txt usage section.
> > So whichever vendor supports it, whichever protocol supports it, can set
> this optional device capability.
> >
> >>
> >> Again, I'd be a lot less concerned if non-strict were the default,
> >> and strict mode was negotiated. It's all just so upside-down.
> >
> > In IB spec, in-order delivery is default.
> 
> I don't agree. Requests are sent in-order, and the responder processes them
> in-order, but the bytes thenselves are not guaranteed to appear in-order.
> Additionally, if retries occur, this is most definitely not the case.
> 
> Section 9.5 Transaction Ordering, I believe, covers these requirements. Can
> you tell me where I misunderstand them?
> In fact, c9-28 explicitly warns:
> 
>    • An application shall not depend upon the order of data writes to
>    memory within a message. For example, if an application sets up
>    data buffers that overlap, for separate data segments within a
>    message, it is not guaranteed that the last sent data will always
>    overwrite the earlier.
> 
> My guess is that this bit overrides the MLX behavior of never pipelining RDMA
> Write requests, allowing more packets to be queued at the responder and
> making better use of the network. This is not at all prohibited by the spec, nor
> is it unexpected by properly-coded upper layers, which all the kernel
> consumers are.
> 
> I have one other question on the Documentation out-of-order.txt.
> It states the fence bit can be used to force ordering on a non-strict
> connection. But fence doesn't apply to RDMA Write?
> It only applies to operations which produce a reply, such as RDMA Read or
> Atomic. Have you changed the semantic?
> 
> Tom.
> 
> 
> 
> So can you suggest how can we change default IB behavior without breaking
> anything?
> > Adding optional attribute seems the right way that ensures compatibility.
> >>
> >> Tom.
> >>
> >>>> 2) I assume that there is some performance benefit to toggling this
> >>>> setting to non-strict. So, how do existing consumers get this
> >>>> advantage, especially since they don't need strict semantics?
> >>>> Bearing in mind that they do have to negotiate this end-to-end,
> >>>> meaning they
> >> require a protocol extension.
> >>> I don't have completely transparent upstream solution for existing
> >> consumers yet.
> >>>>
> >>>> Actually. I have a third thought. Since this is an attribute to qp
> >>>> creation, performed even before establishing a connection, how does
> >>>> the upper layer know when to set it?
> >>> This is not at QP creation time. I have described in
> >> Documentation/out_of_order.txt in usage section 3.
> >>> This is at QP state transition from INIT to RTR.
> >>> Here is the flow. It's just not coded enough for posting patches.
> >>>
> >>> 1. When rdmacm active side creates the QP, It is INIT state.
> >>> 2. Send MAD_Req msg (indicating ooo_requested=1) 3. When rdmacm
> >>> passive side receives the message, it looks up device_cap attribute
> >>> and
> >> matches it against ooo_requested flag.
> >>> 4. when device supports it, MAD_Rsp msg sets ooo_enabled=1, if it
> >>> doesn't support it, ooo_enabled=0 5. rdmacm passive side creates the
> >>> QP
> >> and moves to RTR state (with QP ooo enabled bit set).
> >>> 6. active side receives the message and puts the QP to RTR, RTS
> >>> state
> >> based on received bit setting from passive side.
> >>>
> >>> Flow is no different than how rest of the connection specific
> >>> parameters
> >> are shared such as IRD/ORD, PSN, timeouts, mtu etc.
> >>>
> >>>
> >>>
> >>>>
> >>>> Tom.
> >>> N     r  y   b X  ǧv ^ )޺{.n +    {  ٚ {ay ʇڙ ,j   f   h   z  w
> >>
> >>     j:+v   w j m         zZ+     ݢj"  !tml=
> >>>
> > N     r  y   b X  ǧv ^ )޺{.n +    {  ٚ {ay ʇڙ ,j   f   h   z  w
> 
>    j:+v   w j m         zZ+     ݢj"  !tml=
> >
��.n��������+%������w��{.n�����{���fk��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux