Re: [PATCH v1 0/4] GPU Direct RDMA (P2P DMA) for Device Private Pages

Zhu Yanjun <yanjun.zhu@xxxxxxxxx> · Fri, 18 Oct 2024 09:26:50 +0200

在 2024/10/16 17:16, Yonatan Maman 写道:

On 16/10/2024 7:23, Christoph Hellwig wrote:
On Tue, Oct 15, 2024 at 06:23:44PM +0300, Yonatan Maman wrote:
From: Yonatan Maman <Ymaman@xxxxxxxxxx>

This patch series aims to enable Peer-to-Peer (P2P) DMA access in
GPU-centric applications that utilize RDMA and private device pages. 
This
enhancement is crucial for minimizing data transfer overhead by allowing
the GPU to directly expose device private page data to devices such as
NICs, eliminating the need to traverse system RAM, which is the native
method for exposing device private page data.

Please tone down your marketing language and explain your factual
changes.  If you make performance claims back them by numbers.

Got it, thanks! I'll fix that. Regarding performance, we’re achieving 
over 10x higher bandwidth and 10x lower latency using perftest-rdma, 
especially (with a high rate of GPU memory access).

If I got this patch series correctly, this is based on ODP (On Demand 
Paging). And a way also exists which is based on non-ODP. From the 
following links, this way is implemented on efa, irdma and mlx5.
1. iRDMA
https://lore.kernel.org/all/20230217011425.498847-1-yanjun.zhu@xxxxxxxxx/

2. efa
https://lore.kernel.org/lkml/20211007114018.GD2688930@xxxxxxxx/t/

3. mlx5
https://lore.kernel.org/all/1608067636-98073-5-git-send-email-jianxin.xiong@xxxxxxxxx/

Because these 2 methods are both implemented on mlx5, have you compared 
the test results with the 2 methods on mlx5?

The most important results should be latency and bandwidth. Please let 
us know the test results.

Thanks a lot.
Zhu Yanjun