Hi Tom, Jason, Sorry for the late response. Please find the response inline below. > -----Original Message----- > From: Tom Talpey [mailto:tom@xxxxxxxxxx] > Sent: Monday, June 12, 2017 8:30 PM > To: Parav Pandit <parav@xxxxxxxxxxxx>; Jason Gunthorpe > <jgunthorpe@xxxxxxxxxxxxxxxxxxxx> > Cc: Bart Van Assche <Bart.VanAssche@xxxxxxxxxxx>; leon@xxxxxxxxxx; > dledford@xxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx; Idan Burstein > <idanb@xxxxxxxxxxxx> > Subject: Re: [PATCH rdma-next 0/3] Support out of order data placement > > > > > In IB spec, in-order delivery is default. > > I don't agree. Requests are sent in-order, and the responder processes them in- > order, but the bytes thenselves are not guaranteed to appear in-order. > Additionally, if retries occur, this is most definitely not the case. > > Section 9.5 Transaction Ordering, I believe, covers these requirements. Can you > tell me where I misunderstand them? > In fact, c9-28 explicitly warns: > > • An application shall not depend upon the order of data writes to > memory within a message. For example, if an application sets up > data buffers that overlap, for separate data segments within a > message, it is not guaranteed that the last sent data will always > overwrite the earlier. > The IB spec indeed does not imply any ordering in the placement of data into memory within a single message. It does guarantee that writes don't bypass writes and reads don't bypass reads (Table 79), and transport operations are executed in their *message* order (C9-28): "A responder shall execute SEND requests, RDMA WRITE requests and ATOMIC Operation requests in the message order in which they are received." Thus, ordering between messages is guaranteed - changes to remote memory of an RDMA-W will be observed strictly after any changes done by a previous RDMA-W; changes to local memory of an RDMA-R response will be observed strictly after any changes done by a previous RDMA-R response. The proposed feature in this patch set is to relax the memory placement ordering *across* messages and not within a single message (which is not mandated by the spec as u noted), such that multiple consecutive RDMA-Ws may be committed to memory in any order, and similarly for RDMA-R responses. This changes application semantics whenever multiple-inflight RDMA operations write to overlapping locations, or when one operation indicates the completion of the other. A simple example to clarify: a requestor posted the following work elements in the written order: 1. RDMA-W(VA=0x1000, value=0x1) 2. RDMA-W(VA=0x1000, value=0x2) 3. Send() On responder side, following the Send() operation completion, and according to spec (C9-28), reading from VA=0x1000 will produce the value 2. With the proposed feature enabled, the read value is not deterministic and dependent on the order in which the RDMA-W operations were received. The proposed QP flag allows applications to knowingly indicate this relaxed data placement, thereby enabling the HCA to place OOO RDMA messages into memory without buffering them. > I have one other question on the Documentation out-of-order.txt. > It states the fence bit can be used to force ordering on a non-strict connection. > But fence doesn't apply to RDMA Write? > It only applies to operations which produce a reply, such as RDMA Read or > Atomic. Have you changed the semantic? > RDMA-R followed by RDMA-R semantic is changed when proposed QP flag is set. ��.n��������+%������w��{.n�����{���fk��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f