Re: Strange krbd behaviour with queue depths

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Nick,
I think this is because of the krbd you are using is using Naggle's algorithm i.e TCP_NODELAY = false by default.
The latest krbd module should have the TCP_NODELAY =  true by default. You may want to try that. But, I think it is available in the latest kernel only.
Librbd is running with TCP_NODELAY = true by default, you may want to try with ms_tcp_nodelay = false to simulate the similar behavior with librbd.

Thanks & Regards
Somnath

-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Alexandre DERUMIER
Sent: Friday, March 06, 2015 3:59 AM
To: Nick Fisk
Cc: ceph-users
Subject: Re:  Strange krbd behaviour with queue depths

Hi, do you have tried with differents io schedulers to compare ?


----- Mail original -----
De: "Nick Fisk" <nick@xxxxxxxxxx>
À: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
Envoyé: Jeudi 5 Mars 2015 18:17:27
Objet:  Strange krbd behaviour with queue depths



I’m seeing a strange queue depth behaviour with a kernel mapped RBD, librbd does not show this problem.



Cluster is comprised of 4 nodes, 10GB networking, not including OSDs as test sample is small so fits in page cache.



Running fio against a kernel mapped RBD

fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=/dev/rbd/cache1/test2 --bs=4k --readwrite=randread --iodepth=1 --runtime=10 --size=1g



Queue Depth

IOPS


1

2021


2

288


4

376


8

601


16

1272


32

2467


64

16901


128

44060



See how initially I get a very high number of IOs at queue depth 1, but this drops dramatically as soon as I start increasing the queue depth. It’s not until a depth or 32 IOs that I start to get similar performance. Incidentally when changing the read type to sequential instead of random the oddity goes away.



Running fio with librbd engine and the same test options I get the following



Queue Depth

IOPS


1

1492


2

3232


4

7099


8

13875


16

18759


32

17998


64

18104


128

18589





As you can see the performance scales up nicely, although the top end IO’s seem limited to around 18k. I don’t know if this is due to kernel/userspace performance differences or if there is a lower max queue depth limit in librbd.



Both tests were run on a small sample size to force the OSD data into page cache to rule out any device latency.



Does anyone know why kernel mapped RBD’s show this weird behaviour? I don’t think it can be OSD/ceph config related as it only happens with krbd’s.



Nick




_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux