On 18/10/2024 10:26, Zhu Yanjun wrote:
External email: Use caution opening links or attachments
在 2024/10/16 17:16, Yonatan Maman 写道:
On 16/10/2024 7:23, Christoph Hellwig wrote:
On Tue, Oct 15, 2024 at 06:23:44PM +0300, Yonatan Maman wrote:
From: Yonatan Maman <Ymaman@xxxxxxxxxx>
This patch series aims to enable Peer-to-Peer (P2P) DMA access in
GPU-centric applications that utilize RDMA and private device pages.
This
enhancement is crucial for minimizing data transfer overhead by
allowing
the GPU to directly expose device private page data to devices such as
NICs, eliminating the need to traverse system RAM, which is the native
method for exposing device private page data.
Please tone down your marketing language and explain your factual
changes. If you make performance claims back them by numbers.
Got it, thanks! I'll fix that. Regarding performance, we’re achieving
over 10x higher bandwidth and 10x lower latency using perftest-rdma,
especially (with a high rate of GPU memory access).
If I got this patch series correctly, this is based on ODP (On Demand
Paging). And a way also exists which is based on non-ODP. From the
following links, this way is implemented on efa, irdma and mlx5.
1. iRDMA
https://lore.kernel.org/all/20230217011425.498847-1-yanjun.zhu@xxxxxxxxx/
2. efa
https://lore.kernel.org/lkml/20211007114018.GD2688930@xxxxxxxx/t/
3. mlx5
https://lore.kernel.org/all/1608067636-98073-5-git-send-email-
jianxin.xiong@xxxxxxxxx/
Because these 2 methods are both implemented on mlx5, have you compared
the test results with the 2 methods on mlx5?
The most important results should be latency and bandwidth. Please let
us know the test results.
Thanks a lot.
Zhu Yanjun
This patch-set aims to support GPU Direct RDMA for HMM ODP memory.
Compared to the dma-buf method, we achieve the same performance (BW and
latency), for GPU intensive test-cases (No CPU accesses during the test).