Hi Somnath, I think you hit the nail on the head, setting librbd to not use TCP_NODELAY shows the same behaviour as with krbd. Mark if you are still interested here are the two latency reports Queue Depth=1 slat (usec): min=24, max=210, avg=39.40, stdev=11.54 clat (usec): min=310, max=78268, avg=769.48, stdev=1764.41 lat (usec): min=341, max=78298, avg=808.88, stdev=1764.39 clat percentiles (usec): | 1.00th=[ 462], 5.00th=[ 466], 10.00th=[ 474], 20.00th=[ 620], | 30.00th=[ 620], 40.00th=[ 628], 50.00th=[ 636], 60.00th=[ 772], | 70.00th=[ 772], 80.00th=[ 788], 90.00th=[ 924], 95.00th=[ 940], | 99.00th=[ 1080], 99.50th=[ 1384], 99.90th=[33536], 99.95th=[45312], | 99.99th=[63232] bw (KB /s): min= 2000, max= 5880, per=100.00%, avg=4951.16, stdev=877.96 lat (usec) : 500=12.71%, 750=40.82%, 1000=45.41% lat (msec) : 2=0.69%, 4=0.13%, 10=0.04%, 20=0.04%, 50=0.11% lat (msec) : 100=0.05% Queue Depth =2 slat (usec): min=21, max=135, avg=38.72, stdev=13.18 clat (usec): min=346, max=77340, avg=6450.22, stdev=13390.20 lat (usec): min=377, max=77368, avg=6488.94, stdev=13389.56 clat percentiles (usec): | 1.00th=[ 462], 5.00th=[ 470], 10.00th=[ 498], 20.00th=[ 612], | 30.00th=[ 628], 40.00th=[ 652], 50.00th=[ 684], 60.00th=[ 772], | 70.00th=[ 820], 80.00th=[ 996], 90.00th=[37120], 95.00th=[38656], | 99.00th=[40192], 99.50th=[40704], 99.90th=[45312], 99.95th=[64768], | 99.99th=[77312] bw (KB /s): min= 931, max= 1611, per=99.42%, avg=1223.84, stdev=186.30 lat (usec) : 500=11.37%, 750=42.60%, 1000=26.11% lat (msec) : 2=3.37%, 4=0.71%, 10=0.16%, 20=0.16%, 50=15.45% lat (msec) : 100=0.06% Many Thanks, Nick > -----Original Message----- > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of > Somnath Roy > Sent: 06 March 2015 16:02 > To: Alexandre DERUMIER; Nick Fisk > Cc: ceph-users > Subject: Re: Strange krbd behaviour with queue depths > > Nick, > I think this is because of the krbd you are using is using Naggle's algorithm i.e > TCP_NODELAY = false by default. > The latest krbd module should have the TCP_NODELAY = true by default. > You may want to try that. But, I think it is available in the latest kernel only. > Librbd is running with TCP_NODELAY = true by default, you may want to try > with ms_tcp_nodelay = false to simulate the similar behavior with librbd. > > Thanks & Regards > Somnath > > -----Original Message----- > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of > Alexandre DERUMIER > Sent: Friday, March 06, 2015 3:59 AM > To: Nick Fisk > Cc: ceph-users > Subject: Re: Strange krbd behaviour with queue depths > > Hi, do you have tried with differents io schedulers to compare ? > > > ----- Mail original ----- > De: "Nick Fisk" <nick@xxxxxxxxxx> > À: "ceph-users" <ceph-users@xxxxxxxxxxxxxx> > Envoyé: Jeudi 5 Mars 2015 18:17:27 > Objet: Strange krbd behaviour with queue depths > > > > I’m seeing a strange queue depth behaviour with a kernel mapped RBD, > librbd does not show this problem. > > > > Cluster is comprised of 4 nodes, 10GB networking, not including OSDs as test > sample is small so fits in page cache. > > > > Running fio against a kernel mapped RBD > > fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test > --filename=/dev/rbd/cache1/test2 --bs=4k --readwrite=randread -- > iodepth=1 --runtime=10 --size=1g > > > > Queue Depth > > IOPS > > > 1 > > 2021 > > > 2 > > 288 > > > 4 > > 376 > > > 8 > > 601 > > > 16 > > 1272 > > > 32 > > 2467 > > > 64 > > 16901 > > > 128 > > 44060 > > > > See how initially I get a very high number of IOs at queue depth 1, but this > drops dramatically as soon as I start increasing the queue depth. It’s not until > a depth or 32 IOs that I start to get similar performance. Incidentally when > changing the read type to sequential instead of random the oddity goes > away. > > > > Running fio with librbd engine and the same test options I get the following > > > > Queue Depth > > IOPS > > > 1 > > 1492 > > > 2 > > 3232 > > > 4 > > 7099 > > > 8 > > 13875 > > > 16 > > 18759 > > > 32 > > 17998 > > > 64 > > 18104 > > > 128 > > 18589 > > > > > > As you can see the performance scales up nicely, although the top end IO’s > seem limited to around 18k. I don’t know if this is due to kernel/userspace > performance differences or if there is a lower max queue depth limit in > librbd. > > > > Both tests were run on a small sample size to force the OSD data into page > cache to rule out any device latency. > > > > Does anyone know why kernel mapped RBD’s show this weird behaviour? I > don’t think it can be OSD/ceph config related as it only happens with krbd’s. > > > > Nick > > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ________________________________ > > PLEASE NOTE: The information contained in this electronic mail message is > intended only for the use of the designated recipient(s) named above. If the > reader of this message is not the intended recipient, you are hereby notified > that you have received this message in error and that any review, > dissemination, distribution, or copying of this message is strictly prohibited. If > you have received this communication in error, please notify the sender by > telephone or e-mail (as shown above) immediately and destroy any and all > copies of this message in your possession (whether hard copies or > electronically stored copies). > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com