Re: [PATCH rdma-next 0/3] Support out of order data placement

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/18/2017 10:33 PM, Parav Pandit wrote:
Hi Tom, Jason,

Sorry for the late response.
Please find the response inline below.

-----Original Message-----
From: Tom Talpey [mailto:tom@xxxxxxxxxx]
Sent: Monday, June 12, 2017 8:30 PM
To: Parav Pandit <parav@xxxxxxxxxxxx>; Jason Gunthorpe
<jgunthorpe@xxxxxxxxxxxxxxxxxxxx>
Cc: Bart Van Assche <Bart.VanAssche@xxxxxxxxxxx>; leon@xxxxxxxxxx;
dledford@xxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx; Idan Burstein
<idanb@xxxxxxxxxxxx>
Subject: Re: [PATCH rdma-next 0/3] Support out of order data placement


In IB spec, in-order delivery is default.

I don't agree. Requests are sent in-order, and the responder processes them in-
order, but the bytes thenselves are not guaranteed to appear in-order.
Additionally, if retries occur, this is most definitely not the case.

Section 9.5 Transaction Ordering, I believe, covers these requirements. Can you
tell me where I misunderstand them?
In fact, c9-28 explicitly warns:

    • An application shall not depend upon the order of data writes to
    memory within a message. For example, if an application sets up
    data buffers that overlap, for separate data segments within a
    message, it is not guaranteed that the last sent data will always
    overwrite the earlier.

The IB spec indeed does not imply any ordering in the placement of data into memory within a single message.

It does guarantee that writes don't bypass writes and reads don't bypass reads (Table 79), and transport operations are executed in their *message* order (C9-28):
"A responder shall execute SEND requests, RDMA WRITE requests
and ATOMIC Operation requests in the message order in which
they are received."

Thus, ordering between messages is guaranteed - changes to remote memory of an RDMA-W will be observed strictly after any changes done by a previous RDMA-W; changes to local memory of an RDMA-R response will be observed strictly after any changes done by a previous RDMA-R response.

The proposed feature in this patch set is to relax the memory placement ordering *across* messages and not within a single message (which is not mandated by the spec as u noted), such that multiple consecutive RDMA-Ws may be committed to memory in any order, and similarly for RDMA-R responses.
This changes application semantics whenever multiple-inflight RDMA operations write to overlapping locations, or when one operation indicates the completion of the other.
A simple example to clarify: a requestor posted the following work elements in the written order:
1. RDMA-W(VA=0x1000, value=0x1)
2. RDMA-W(VA=0x1000, value=0x2)
3. Send()
On responder side, following the Send() operation completion, and according to spec (C9-28), reading from VA=0x1000 will produce the value 2. With the proposed feature enabled, the read value is not deterministic and dependent on the order in which the RDMA-W operations were received.

The proposed QP flag allows applications to knowingly indicate this relaxed data placement, thereby enabling the HCA to place OOO RDMA messages into memory without buffering them.

You didn't answer my question what is the actual benefit of relaxing
the ordering. Is it performance? And, specifically what applications
*can't* use it?

To me, it appears that most storage upper layers can already use
the extension. If it performs better, I expect they will definitely
want to enable it. In that case I believe it should be the *default*,
not an opt-in that these upper layers are newly responsible for.

I have one other question on the Documentation out-of-order.txt.
It states the fence bit can be used to force ordering on a non-strict connection.
But fence doesn't apply to RDMA Write?
It only applies to operations which produce a reply, such as RDMA Read or
Atomic. Have you changed the semantic?

RDMA-R followed by RDMA-R semantic is changed when proposed QP flag is set.

Can you explain that statement in more detail please? Also, please
clarify on what operation(s) the fence bit now applies.

Tom.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux