I guess that like the rest of ceph, the new rdma code must also support multiple applications in parallel. I am also reproducing your error => 2 instances of fio can't run in parallel with ceph rdma. * with ceph -s shows HEALTH_WARN (with "9 requests are blocked > 32 sec") * and with all osds printing messages like " heartbeat_check: no reply from ..." * And with log files contains errors: $ grep error ceph-osd.0.log 2016-11-23 09:20:46.988154 7f9b26260700 -1 Fail to open '/proc/0/cmdline' error = (2) No such file or directory 2016-11-23 09:20:54.090388 7f9b43951700 1 -- 36.0.0.2:6802/10634 >> 36.0.0.4:0/19587 conn(0x7f9b256a8000 :6802 s=STATE_OPEN pgs=1 cs=1 l=1).read_bulk reading from fd=139 : Unknown error -104 2016-11-23 09:20:58.411912 7f9b44953700 1 RDMAStack polling work request returned error for buffer(0x7f9b1fee21b0) status(12:RETRY_EXC_ERR 2016-11-23 09:20:58.411934 7f9b44953700 1 RDMAStack polling work request returned error for buffer(0x7f9b553d20d0) status(12:RETRY_EXC_ERR Command lines that I used: ./fio --ioengine=rbd --invalidate=0 --rw=write --bs=128K --numjobs=1 --clientname=admin --pool=rbd --iodepth=128 --rbdname=img2g --name=1 ./fio --ioengine=rbd --invalidate=0 --rw=write --bs=128K --numjobs=1 --clientname=admin --pool=rbd --iodepth=128 --rbdname=img2g2 --name=1 > -----Original Message----- > From: Marov Aleksey > Sent: Tuesday, November 22, 2016 17:59 > > I didn't try this blocksize. But in my case fio crushed if I use more than one > job. With one job everything works fine. Is it worth more deep investigating? -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html