RE: ceph issue

Avner Ben Hanoch <avnerb@xxxxxxxxxxxx> · Wed, 23 Nov 2016 09:30:13 +0000

I guess that like the rest of ceph, the new rdma code must also support multiple applications in parallel.

I am also reproducing your error => 2 instances of fio can't run in parallel with ceph rdma.

* with ceph -s shows HEALTH_WARN (with "9 requests are blocked > 32 sec")

* and with all osds printing messages like " heartbeat_check: no reply from ..." 

* And with log files contains errors:
  $ grep error ceph-osd.0.log
  2016-11-23 09:20:46.988154 7f9b26260700 -1 Fail to open '/proc/0/cmdline' error = (2) No such file or directory
  2016-11-23 09:20:54.090388 7f9b43951700  1 -- 36.0.0.2:6802/10634 >> 36.0.0.4:0/19587 conn(0x7f9b256a8000 :6802 s=STATE_OPEN pgs=1 cs=1 l=1).read_bulk reading from fd=139 : Unknown error -104
  2016-11-23 09:20:58.411912 7f9b44953700  1 RDMAStack polling work request returned error for buffer(0x7f9b1fee21b0) status(12:RETRY_EXC_ERR
  2016-11-23 09:20:58.411934 7f9b44953700  1 RDMAStack polling work request returned error for buffer(0x7f9b553d20d0) status(12:RETRY_EXC_ERR

Command lines that I used: 
  ./fio --ioengine=rbd --invalidate=0 --rw=write --bs=128K --numjobs=1 --clientname=admin --pool=rbd --iodepth=128 --rbdname=img2g --name=1
  ./fio --ioengine=rbd --invalidate=0 --rw=write --bs=128K --numjobs=1 --clientname=admin --pool=rbd --iodepth=128 --rbdname=img2g2 --name=1

> -----Original Message-----
> From: Marov Aleksey
> Sent: Tuesday, November 22, 2016 17:59
> 
> I didn't try this blocksize. But in my case fio crushed if I use more than one
> job. With one job everything works fine. Is it worth more deep investigating?
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html