libibverbs bug for transfer small cuda data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello, below is the bug when we try to transfer the cuda data through infiniband. 

How to reproduce the bug
  1. Prepare two nodes
  2. Download the code and run `mkdir build && cd build && cmake .. -G Ninja` 
  3. Set the receiver's IP address to `cuda_sender.cpp:51`
  4. Run `ninja`
  5. Run ./cuda_receiver on receive machine
  6. Run ./cuda_sender on send machine
The problem arises when I set the opcode of `ibv_send_wr` to `IBV_WR_SEND` and the data sent is too small (less than 9 float32). It appears that the sender can send data successfully, but the receiver will be segmentation fault when calling ibv_poll_cq.
It can be remedied with `IBV_WR_SEND` replaced by `IBV_WR_RDMA_WRITE_WITH_IMM`  in ibv.h (and the remote address and remote key provided).


Since I'm not familiar with the behavior of each opcode in `ibv_send_wr`, could you instruct me whether the problem described above is considered as a but or expected?


Code to reproduce the problem is enclosed as attachment.

Thank you. 


Attachment: cuda_sender.cpp
Description: Binary data

Attachment: ibv.h
Description: Binary data

Attachment: cuda_receiver.cpp
Description: Binary data

Attachment: CMakeLists.txt
Description: Binary data


[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux