> -----Original Message----- > From: Edward Srouji <edwards@xxxxxxxxxx> > Sent: Thursday, September 5, 2024 2:23 PM > To: Zhu Yanjun <yanjun.zhu@xxxxxxxxx>; Leon Romanovsky <leon@xxxxxxxxxx>; > Jason Gunthorpe <jgg@xxxxxxxxxx> > Cc: Leon Romanovsky <leonro@xxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx; > linux-rdma@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; Saeed Mahameed > <saeedm@xxxxxxxxxx>; Tariq Toukan <tariqt@xxxxxxxxxx>; Yishai Hadas > <yishaih@xxxxxxxxxx> > Subject: [EXTERNAL] Re: [PATCH rdma-next 0/2] Introduce mlx5 data direct > placement (DDP) > > > On 9/4/2024 2:53 PM, Zhu Yanjun wrote: > > External email: Use caution opening links or attachments > > > > > > 在 2024/9/4 16:27, Edward Srouji 写道: > >> > >> On 9/4/2024 9:02 AM, Zhu Yanjun wrote: > >>> External email: Use caution opening links or attachments > >>> > >>> > >>> 在 2024/9/3 19:37, Leon Romanovsky 写道: > >>>> From: Leon Romanovsky <leonro@xxxxxxxxxx> > >>>> > >>>> Hi, > >>>> > >>>> This series from Edward introduces mlx5 data direct placement (DDP) > >>>> feature. > >>>> > >>>> This feature allows WRs on the receiver side of the QP to be consumed > >>>> out of order, permitting the sender side to transmit messages without > >>>> guaranteeing arrival order on the receiver side. > >>>> > >>>> When enabled, the completion ordering of WRs remains in-order, > >>>> regardless of the Receive WRs consumption order. > >>>> > >>>> RDMA Read and RDMA Atomic operations on the responder side continue to > >>>> be executed in-order, while the ordering of data placement for RDMA > >>>> Write and Send operations is not guaranteed. > >>> > >>> It is an interesting feature. If I got this feature correctly, this > >>> feature permits the user consumes the data out of order when RDMA Write > >>> and Send operations. But its completiong ordering is still in order. > >>> > >> Correct. > >>> Any scenario that this feature can be applied and what benefits will be > >>> got from this feature? > >>> > >>> I am just curious about this. Normally the users will consume the data > >>> in order. In what scenario, the user will consume the data out of > >>> order? > >>> > >> One of the main benefits of this feature is achieving higher bandwidth > >> (BW) by allowing > >> responders to receive packets out of order (OOO). > >> > >> For example, this can be utilized in devices that support multi-plane > >> functionality, > >> as introduced in the "Multi-plane support for mlx5" series [1]. When > >> mlx5 multi-plane > >> is supported, a single logical mlx5 port aggregates multiple physical > >> plane ports. > >> In this scenario, the requester can "spray" packets across the > >> multiple physical > >> plane ports without guaranteeing packet order, either on the wire or > >> on the receiver > >> (responder) side. > >> > >> With this approach, no barriers or fences are required to ensure > >> in-order packet > >> reception, which optimizes the data path for performance. This can > >> result in better > >> BW, theoretically achieving line-rate performance equivalent to the > >> sum of > >> the maximum BW of all physical plane ports, with only one QP. > > > > Thanks a lot for your quick reply. Without ensuring in-order packet > > reception, this does optimize the data path for performance. > > > > I agree with you. > > > > But how does the receiver get the correct packets from the out-of-order > > packets efficiently? > > > > The method is implemented in Software or Hardware? > > > The packets have new field that is used by the HW to understand the > correct message order (similar to PSN). > Interesting feature! Reminds me somehow on iWarp RDMA with its DDP sub-layer 😉 But can that extra field be compliant with the standardized wire protocol? Thanks, Bernard. > Once the packets arrive OOO to the receiver side, the data is scattered > directly (hence the DDP - "Direct Data Placement" name) by the HW. > > So the efficiency is achieved by the HW, as it also saves the required > context and metadata so it can deliver the correct completion to the > user (in-order) once we have some WQEs that can be considered an > "in-order window" and be delivered to the user. > > The SW/Applications may receive OOO WR_IDs though (because the first CQE > may have consumed Recv WQE of any index on the receiver side), and it's > their responsibility to handle it from this point, if it's required. > > > > > I am just interested in this feature and want to know more about this. > > > > Thanks, > > > > Zhu Yanjun > > > >> > >> [1] INVALID URI REMOVED > 3A__lore.kernel.org_lkml_cover.1718553901.git.leon- > 40kernel.org_&d=DwIDaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=4ynb4Sj_4MUcZXbhvovE4tYSb > qxyOwdSiLedP4yO55g&m=v7mstcYLoga4Ed_laSGpqjuQbnScgHCiflwmA4TzvXgi9x64qGYB4C > ZGFrxQviQF&s=a-4dG1bvzL3dPsLsCSkubdHg_9eDKHIt-rEGQdaXvgU&e= > >>> Thanks, > >>> Zhu Yanjun > >>> > >>>> > >>>> Thanks > >>>> > >>>> Edward Srouji (2): > >>>> net/mlx5: Introduce data placement ordering bits > >>>> RDMA/mlx5: Support OOO RX WQE consumption > >>>> > >>>> drivers/infiniband/hw/mlx5/main.c | 8 +++++ > >>>> drivers/infiniband/hw/mlx5/mlx5_ib.h | 1 + > >>>> drivers/infiniband/hw/mlx5/qp.c | 51 > >>>> +++++++++++++++++++++++++--- > >>>> include/linux/mlx5/mlx5_ifc.h | 24 +++++++++---- > >>>> include/uapi/rdma/mlx5-abi.h | 5 +++ > >>>> 5 files changed, 78 insertions(+), 11 deletions(-) > >>>> > >>> > > -- > > Best Regards, > > Yanjun.Zhu > >