Performance issue for a simple RDMA PingPong

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello everyone, 

I'm currently having a performance issue to synchronize two different nodes with a simple ping/pong algorithm. 
I currently have two different simple code to resume my issue : 

The first one work as intended, and loop as follow on both client and server sides : 
- post a send work request 
- post a receive work request 
- wait both completion, acknowledge them and continue. 
This little piece of program work as intended, and I'm able to complete 100k request in 2–3 seconds. 

However, the second code is as follows : 
The client is identical as the first code. 
The server do : 
- post a receive work request 
- wait its completion and acknowledge it 
- post a send work request 
- wait its completion and acknowledge it 
When I do this, it happens that the time to complete a request can take up to 2 seconds (most of it inside the "ibv_get_cq_event()") 
Furthermore, we observed that, this happens more often when multiple threads try to do this in synch (unlike first code). 

Nb: I was able to replicate this issue only with send/recv, and never with read/write operations. 

I try looking for this issue, but found nothing related. 

I was able to test this on multiple configuration: 
Linux version : 5.10.0-20-amd64, linux distribution : Debian11 & Debian12 
We have a Omni-Path network, configured to 100Gb/s ( Intel Omni-Path HFI Silicon 100 Series [discrete] with the hfi1 driver) Firmware version: 1.27.0 
Or a Infiniband network, configured to 100Gb/s (Mellanox Technologies MT28908 Family [ConnectX-6] with the mlx5_core driver) Firmware version: 20.29.2002 

I tried with the latest version of rdma-core for debian11 & debian 12, having the same issue. 
The program were all compiled with gcc, with the -O3 or -O0 optimisation, without any change in the communication time. 

The full code for the exemple described above can be found on GitHub under : IAdamUGA/RDMAPerfIssue 

I'm not aware if this is a usual behavior or not, or if this is a knowned issue. 

I'm relatively new to this domain and i might not know about the different tools that could help me debug that. 

Thank you in advance for your help. 
Best regards. 
__________________________ 

Ivane ADAM 
Doctorant LIG, équipe Erods 
ivane.adam@xxxxxxxxxxxxxxxxxxxxxx 






[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux