Re: Strange krbd behaviour with queue depths

Nick Fisk <nick@xxxxxxxxxx> · Fri, 6 Mar 2015 16:27:32 -0000

Hi Somnath,

I think you hit the nail on the head, setting librbd to not use TCP_NODELAY shows the same behaviour as with krbd.

Mark if you are still interested here are the two latency reports

Queue Depth=1
    slat (usec): min=24, max=210, avg=39.40, stdev=11.54
    clat (usec): min=310, max=78268, avg=769.48, stdev=1764.41
     lat (usec): min=341, max=78298, avg=808.88, stdev=1764.39
    clat percentiles (usec):
     |  1.00th=[  462],  5.00th=[  466], 10.00th=[  474], 20.00th=[  620],
     | 30.00th=[  620], 40.00th=[  628], 50.00th=[  636], 60.00th=[  772],
     | 70.00th=[  772], 80.00th=[  788], 90.00th=[  924], 95.00th=[  940],
     | 99.00th=[ 1080], 99.50th=[ 1384], 99.90th=[33536], 99.95th=[45312],
     | 99.99th=[63232]
    bw (KB  /s): min= 2000, max= 5880, per=100.00%, avg=4951.16, stdev=877.96
    lat (usec) : 500=12.71%, 750=40.82%, 1000=45.41%
    lat (msec) : 2=0.69%, 4=0.13%, 10=0.04%, 20=0.04%, 50=0.11%
    lat (msec) : 100=0.05%

Queue Depth =2
    slat (usec): min=21, max=135, avg=38.72, stdev=13.18
    clat (usec): min=346, max=77340, avg=6450.22, stdev=13390.20
     lat (usec): min=377, max=77368, avg=6488.94, stdev=13389.56
    clat percentiles (usec):
     |  1.00th=[  462],  5.00th=[  470], 10.00th=[  498], 20.00th=[  612],
     | 30.00th=[  628], 40.00th=[  652], 50.00th=[  684], 60.00th=[  772],
     | 70.00th=[  820], 80.00th=[  996], 90.00th=[37120], 95.00th=[38656],
     | 99.00th=[40192], 99.50th=[40704], 99.90th=[45312], 99.95th=[64768],
     | 99.99th=[77312]
    bw (KB  /s): min=  931, max= 1611, per=99.42%, avg=1223.84, stdev=186.30
    lat (usec) : 500=11.37%, 750=42.60%, 1000=26.11%
    lat (msec) : 2=3.37%, 4=0.71%, 10=0.16%, 20=0.16%, 50=15.45%
    lat (msec) : 100=0.06%

Many Thanks,
Nick

> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
> Somnath Roy
> Sent: 06 March 2015 16:02
> To: Alexandre DERUMIER; Nick Fisk
> Cc: ceph-users
> Subject: Re:  Strange krbd behaviour with queue depths
> 
> Nick,
> I think this is because of the krbd you are using is using Naggle's algorithm i.e
> TCP_NODELAY = false by default.
> The latest krbd module should have the TCP_NODELAY =  true by default.
> You may want to try that. But, I think it is available in the latest kernel only.
> Librbd is running with TCP_NODELAY = true by default, you may want to try
> with ms_tcp_nodelay = false to simulate the similar behavior with librbd.
> 
> Thanks & Regards
> Somnath
> 
> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
> Alexandre DERUMIER
> Sent: Friday, March 06, 2015 3:59 AM
> To: Nick Fisk
> Cc: ceph-users
> Subject: Re:  Strange krbd behaviour with queue depths
> 
> Hi, do you have tried with differents io schedulers to compare ?
> 
> 
> ----- Mail original -----
> De: "Nick Fisk" <nick@xxxxxxxxxx>
> À: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
> Envoyé: Jeudi 5 Mars 2015 18:17:27
> Objet:  Strange krbd behaviour with queue depths
> 
> 
> 
> I’m seeing a strange queue depth behaviour with a kernel mapped RBD,
> librbd does not show this problem.
> 
> 
> 
> Cluster is comprised of 4 nodes, 10GB networking, not including OSDs as test
> sample is small so fits in page cache.
> 
> 
> 
> Running fio against a kernel mapped RBD
> 
> fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test
> --filename=/dev/rbd/cache1/test2 --bs=4k --readwrite=randread --
> iodepth=1 --runtime=10 --size=1g
> 
> 
> 
> Queue Depth
> 
> IOPS
> 
> 
> 1
> 
> 2021
> 
> 
> 2
> 
> 288
> 
> 
> 4
> 
> 376
> 
> 
> 8
> 
> 601
> 
> 
> 16
> 
> 1272
> 
> 
> 32
> 
> 2467
> 
> 
> 64
> 
> 16901
> 
> 
> 128
> 
> 44060
> 
> 
> 
> See how initially I get a very high number of IOs at queue depth 1, but this
> drops dramatically as soon as I start increasing the queue depth. It’s not until
> a depth or 32 IOs that I start to get similar performance. Incidentally when
> changing the read type to sequential instead of random the oddity goes
> away.
> 
> 
> 
> Running fio with librbd engine and the same test options I get the following
> 
> 
> 
> Queue Depth
> 
> IOPS
> 
> 
> 1
> 
> 1492
> 
> 
> 2
> 
> 3232
> 
> 
> 4
> 
> 7099
> 
> 
> 8
> 
> 13875
> 
> 
> 16
> 
> 18759
> 
> 
> 32
> 
> 17998
> 
> 
> 64
> 
> 18104
> 
> 
> 128
> 
> 18589
> 
> 
> 
> 
> 
> As you can see the performance scales up nicely, although the top end IO’s
> seem limited to around 18k. I don’t know if this is due to kernel/userspace
> performance differences or if there is a lower max queue depth limit in
> librbd.
> 
> 
> 
> Both tests were run on a small sample size to force the OSD data into page
> cache to rule out any device latency.
> 
> 
> 
> Does anyone know why kernel mapped RBD’s show this weird behaviour? I
> don’t think it can be OSD/ceph config related as it only happens with krbd’s.
> 
> 
> 
> Nick
> 
> 
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> ________________________________
> 
> PLEASE NOTE: The information contained in this electronic mail message is
> intended only for the use of the designated recipient(s) named above. If the
> reader of this message is not the intended recipient, you are hereby notified
> that you have received this message in error and that any review,
> dissemination, distribution, or copying of this message is strictly prohibited. If
> you have received this communication in error, please notify the sender by
> telephone or e-mail (as shown above) immediately and destroy any and all
> copies of this message in your possession (whether hard copies or
> electronically stored copies).
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com