Nick, I think this is because of the krbd you are using is using Naggle's algorithm i.e TCP_NODELAY = false by default. The latest krbd module should have the TCP_NODELAY = true by default. You may want to try that. But, I think it is available in the latest kernel only. Librbd is running with TCP_NODELAY = true by default, you may want to try with ms_tcp_nodelay = false to simulate the similar behavior with librbd. Thanks & Regards Somnath -----Original Message----- From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Alexandre DERUMIER Sent: Friday, March 06, 2015 3:59 AM To: Nick Fisk Cc: ceph-users Subject: Re: Strange krbd behaviour with queue depths Hi, do you have tried with differents io schedulers to compare ? ----- Mail original ----- De: "Nick Fisk" <nick@xxxxxxxxxx> À: "ceph-users" <ceph-users@xxxxxxxxxxxxxx> Envoyé: Jeudi 5 Mars 2015 18:17:27 Objet: Strange krbd behaviour with queue depths I’m seeing a strange queue depth behaviour with a kernel mapped RBD, librbd does not show this problem. Cluster is comprised of 4 nodes, 10GB networking, not including OSDs as test sample is small so fits in page cache. Running fio against a kernel mapped RBD fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=/dev/rbd/cache1/test2 --bs=4k --readwrite=randread --iodepth=1 --runtime=10 --size=1g Queue Depth IOPS 1 2021 2 288 4 376 8 601 16 1272 32 2467 64 16901 128 44060 See how initially I get a very high number of IOs at queue depth 1, but this drops dramatically as soon as I start increasing the queue depth. It’s not until a depth or 32 IOs that I start to get similar performance. Incidentally when changing the read type to sequential instead of random the oddity goes away. Running fio with librbd engine and the same test options I get the following Queue Depth IOPS 1 1492 2 3232 4 7099 8 13875 16 18759 32 17998 64 18104 128 18589 As you can see the performance scales up nicely, although the top end IO’s seem limited to around 18k. I don’t know if this is due to kernel/userspace performance differences or if there is a lower max queue depth limit in librbd. Both tests were run on a small sample size to force the OSD data into page cache to rule out any device latency. Does anyone know why kernel mapped RBD’s show this weird behaviour? I don’t think it can be OSD/ceph config related as it only happens with krbd’s. Nick _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ________________________________ PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com