RE: [PATCH rdma-next 0/3] Support out of order data placement

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Tom, Jason,

Sorry for the late response.
Please find the response inline below.

> -----Original Message-----
> From: Tom Talpey [mailto:tom@xxxxxxxxxx]
> Sent: Monday, June 12, 2017 8:30 PM
> To: Parav Pandit <parav@xxxxxxxxxxxx>; Jason Gunthorpe
> <jgunthorpe@xxxxxxxxxxxxxxxxxxxx>
> Cc: Bart Van Assche <Bart.VanAssche@xxxxxxxxxxx>; leon@xxxxxxxxxx;
> dledford@xxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx; Idan Burstein
> <idanb@xxxxxxxxxxxx>
> Subject: Re: [PATCH rdma-next 0/3] Support out of order data placement
> 
> >
> > In IB spec, in-order delivery is default.
> 
> I don't agree. Requests are sent in-order, and the responder processes them in-
> order, but the bytes thenselves are not guaranteed to appear in-order.
> Additionally, if retries occur, this is most definitely not the case.
> 
> Section 9.5 Transaction Ordering, I believe, covers these requirements. Can you
> tell me where I misunderstand them?
> In fact, c9-28 explicitly warns:
> 
>    • An application shall not depend upon the order of data writes to
>    memory within a message. For example, if an application sets up
>    data buffers that overlap, for separate data segments within a
>    message, it is not guaranteed that the last sent data will always
>    overwrite the earlier.
> 
The IB spec indeed does not imply any ordering in the placement of data into memory within a single message.

It does guarantee that writes don't bypass writes and reads don't bypass reads (Table 79), and transport operations are executed in their *message* order (C9-28):
"A responder shall execute SEND requests, RDMA WRITE requests
and ATOMIC Operation requests in the message order in which
they are received."

Thus, ordering between messages is guaranteed - changes to remote memory of an RDMA-W will be observed strictly after any changes done by a previous RDMA-W; changes to local memory of an RDMA-R response will be observed strictly after any changes done by a previous RDMA-R response.

The proposed feature in this patch set is to relax the memory placement ordering *across* messages and not within a single message (which is not mandated by the spec as u noted), such that multiple consecutive RDMA-Ws may be committed to memory in any order, and similarly for RDMA-R responses.
This changes application semantics whenever multiple-inflight RDMA operations write to overlapping locations, or when one operation indicates the completion of the other.
A simple example to clarify: a requestor posted the following work elements in the written order:
1. RDMA-W(VA=0x1000, value=0x1)
2. RDMA-W(VA=0x1000, value=0x2)
3. Send()
On responder side, following the Send() operation completion, and according to spec (C9-28), reading from VA=0x1000 will produce the value 2. With the proposed feature enabled, the read value is not deterministic and dependent on the order in which the RDMA-W operations were received.

The proposed QP flag allows applications to knowingly indicate this relaxed data placement, thereby enabling the HCA to place OOO RDMA messages into memory without buffering them.

> I have one other question on the Documentation out-of-order.txt.
> It states the fence bit can be used to force ordering on a non-strict connection.
> But fence doesn't apply to RDMA Write?
> It only applies to operations which produce a reply, such as RDMA Read or
> Atomic. Have you changed the semantic?
> 
RDMA-R followed by RDMA-R semantic is changed when proposed QP flag is set.
��.n��������+%������w��{.n�����{���fk��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux