On 9/6/2024 6:17 PM, Zhu Yanjun wrote:
External email: Use caution opening links or attachments
在 2024/9/6 20:17, Edward Srouji 写道:
On 9/6/2024 8:02 AM, Zhu Yanjun wrote:
External email: Use caution opening links or attachments
在 2024/9/5 20:23, Edward Srouji 写道:
On 9/4/2024 2:53 PM, Zhu Yanjun wrote:
External email: Use caution opening links or attachments
在 2024/9/4 16:27, Edward Srouji 写道:
On 9/4/2024 9:02 AM, Zhu Yanjun wrote:
External email: Use caution opening links or attachments
在 2024/9/3 19:37, Leon Romanovsky 写道:
From: Leon Romanovsky <leonro@xxxxxxxxxx>
Hi,
This series from Edward introduces mlx5 data direct placement
(DDP)
feature.
This feature allows WRs on the receiver side of the QP to be
consumed
out of order, permitting the sender side to transmit messages
without
guaranteeing arrival order on the receiver side.
When enabled, the completion ordering of WRs remains in-order,
regardless of the Receive WRs consumption order.
RDMA Read and RDMA Atomic operations on the responder side
continue to
be executed in-order, while the ordering of data placement for
RDMA
Write and Send operations is not guaranteed.
It is an interesting feature. If I got this feature correctly, this
feature permits the user consumes the data out of order when RDMA
Write
and Send operations. But its completiong ordering is still in
order.
Correct.
Any scenario that this feature can be applied and what benefits
will be
got from this feature?
I am just curious about this. Normally the users will consume the
data
in order. In what scenario, the user will consume the data out of
order?
One of the main benefits of this feature is achieving higher
bandwidth
(BW) by allowing
responders to receive packets out of order (OOO).
For example, this can be utilized in devices that support
multi-plane
functionality,
as introduced in the "Multi-plane support for mlx5" series [1]. When
mlx5 multi-plane
is supported, a single logical mlx5 port aggregates multiple
physical
plane ports.
In this scenario, the requester can "spray" packets across the
multiple physical
plane ports without guaranteeing packet order, either on the wire or
on the receiver
(responder) side.
With this approach, no barriers or fences are required to ensure
in-order packet
reception, which optimizes the data path for performance. This can
result in better
BW, theoretically achieving line-rate performance equivalent to the
sum of
the maximum BW of all physical plane ports, with only one QP.
Thanks a lot for your quick reply. Without ensuring in-order packet
reception, this does optimize the data path for performance.
I agree with you.
But how does the receiver get the correct packets from the
out-of-order
packets efficiently?
The method is implemented in Software or Hardware?
The packets have new field that is used by the HW to understand the
correct message order (similar to PSN).
Once the packets arrive OOO to the receiver side, the data is
scattered
directly (hence the DDP - "Direct Data Placement" name) by the HW.
So the efficiency is achieved by the HW, as it also saves the required
context and metadata so it can deliver the correct completion to the
user (in-order) once we have some WQEs that can be considered an
"in-order window" and be delivered to the user.
The SW/Applications may receive OOO WR_IDs though (because the first
CQE
may have consumed Recv WQE of any index on the receiver side), and
it's
their responsibility to handle it from this point, if it's required.
Got it. It seems that all the functionalities are implemented in HW.
The
SW only receives OOO WR_IDs. Thanks a lot. Perhaps it is helpful to
RDMA
LAG devices. It should enhance the performance^_^
BTW, do you have any performance data with this feature?
Not yet. We tested it functionality wise for now.
But we should be able to measure its performance soon :).
Thanks a lot. It is an interesting feature. If performance reports,
please share them with us.
Sure, will do.
IMO, perhaps this feature can be used in random read/write devices, for
example, hard disk?
Just my idea. Not sure if you have applied this feature with hard disk
or not.
You're right, it can be used with storage and we're planning to do this
integration and usage in the near future.
Best Regards,
Zhu Yanjun
Best Regards,
Zhu Yanjun
I am just interested in this feature and want to know more about
this.
Thanks,
Zhu Yanjun
[1]
https://lore.kernel.org/lkml/cover.1718553901.git.leon@xxxxxxxxxx/
Thanks,
Zhu Yanjun
Thanks
Edward Srouji (2):
net/mlx5: Introduce data placement ordering bits
RDMA/mlx5: Support OOO RX WQE consumption
drivers/infiniband/hw/mlx5/main.c | 8 +++++
drivers/infiniband/hw/mlx5/mlx5_ib.h | 1 +
drivers/infiniband/hw/mlx5/qp.c | 51
+++++++++++++++++++++++++---
include/linux/mlx5/mlx5_ifc.h | 24 +++++++++----
include/uapi/rdma/mlx5-abi.h | 5 +++
5 files changed, 78 insertions(+), 11 deletions(-)
--
Best Regards,
Yanjun.Zhu
--
Best Regards,
Yanjun.Zhu