Re: Strange krbd behaviour with queue depths

Alexandre DERUMIER <aderumier@xxxxxxxxx> · Fri, 6 Mar 2015 19:52:07 +0100 (CET)

Hi,

does somebody known if redhat will backport new krbd features (discard, blk-mq, tcp_nodelay,...) to the redhat 3.10 kernel ?

Alexandre

----- Mail original -----
De: "Mark Nelson" <mnelson@xxxxxxxxxx>
À: "Nick Fisk" <nick@xxxxxxxxxx>, "Somnath Roy" <Somnath.Roy@xxxxxxxxxxx>, "aderumier" <aderumier@xxxxxxxxx>
Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
Envoyé: Vendredi 6 Mars 2015 17:38:09
Objet: Re: Strange krbd behaviour with queue depths

On 03/06/2015 10:27 AM, Nick Fisk wrote: 
> Hi Somnath, 
> 
> I think you hit the nail on the head, setting librbd to not use TCP_NODELAY shows the same behaviour as with krbd. 

Score (another) 1 for Somnath! :) 

> 
> Mark if you are still interested here are the two latency reports 
> 
> Queue Depth=1 
> slat (usec): min=24, max=210, avg=39.40, stdev=11.54 
> clat (usec): min=310, max=78268, avg=769.48, stdev=1764.41 
> lat (usec): min=341, max=78298, avg=808.88, stdev=1764.39 
> clat percentiles (usec): 
> | 1.00th=[ 462], 5.00th=[ 466], 10.00th=[ 474], 20.00th=[ 620], 
> | 30.00th=[ 620], 40.00th=[ 628], 50.00th=[ 636], 60.00th=[ 772], 
> | 70.00th=[ 772], 80.00th=[ 788], 90.00th=[ 924], 95.00th=[ 940], 
> | 99.00th=[ 1080], 99.50th=[ 1384], 99.90th=[33536], 99.95th=[45312], 
> | 99.99th=[63232] 
> bw (KB /s): min= 2000, max= 5880, per=100.00%, avg=4951.16, stdev=877.96 
> lat (usec) : 500=12.71%, 750=40.82%, 1000=45.41% 
> lat (msec) : 2=0.69%, 4=0.13%, 10=0.04%, 20=0.04%, 50=0.11% 
> lat (msec) : 100=0.05% 
> 
> Queue Depth =2 
> slat (usec): min=21, max=135, avg=38.72, stdev=13.18 
> clat (usec): min=346, max=77340, avg=6450.22, stdev=13390.20 
> lat (usec): min=377, max=77368, avg=6488.94, stdev=13389.56 
> clat percentiles (usec): 
> | 1.00th=[ 462], 5.00th=[ 470], 10.00th=[ 498], 20.00th=[ 612], 
> | 30.00th=[ 628], 40.00th=[ 652], 50.00th=[ 684], 60.00th=[ 772], 
> | 70.00th=[ 820], 80.00th=[ 996], 90.00th=[37120], 95.00th=[38656], 
> | 99.00th=[40192], 99.50th=[40704], 99.90th=[45312], 99.95th=[64768], 
> | 99.99th=[77312] 
> bw (KB /s): min= 931, max= 1611, per=99.42%, avg=1223.84, stdev=186.30 
> lat (usec) : 500=11.37%, 750=42.60%, 1000=26.11% 
> lat (msec) : 2=3.37%, 4=0.71%, 10=0.16%, 20=0.16%, 50=15.45% 
> lat (msec) : 100=0.06% 

Pretty similar latency except for that big 50ms spike at QD=2! 

> 
> Many Thanks, 
> Nick 
> 
>> -----Original Message----- 
>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of 
>> Somnath Roy 
>> Sent: 06 March 2015 16:02 
>> To: Alexandre DERUMIER; Nick Fisk 
>> Cc: ceph-users 
>> Subject: Re:  Strange krbd behaviour with queue depths 
>> 
>> Nick, 
>> I think this is because of the krbd you are using is using Naggle's algorithm i.e 
>> TCP_NODELAY = false by default. 
>> The latest krbd module should have the TCP_NODELAY = true by default. 
>> You may want to try that. But, I think it is available in the latest kernel only. 
>> Librbd is running with TCP_NODELAY = true by default, you may want to try 
>> with ms_tcp_nodelay = false to simulate the similar behavior with librbd. 
>> 
>> Thanks & Regards 
>> Somnath 
>> 
>> -----Original Message----- 
>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of 
>> Alexandre DERUMIER 
>> Sent: Friday, March 06, 2015 3:59 AM 
>> To: Nick Fisk 
>> Cc: ceph-users 
>> Subject: Re:  Strange krbd behaviour with queue depths 
>> 
>> Hi, do you have tried with differents io schedulers to compare ? 
>> 
>> 
>> ----- Mail original ----- 
>> De: "Nick Fisk" <nick@xxxxxxxxxx> 
>> À: "ceph-users" <ceph-users@xxxxxxxxxxxxxx> 
>> Envoyé: Jeudi 5 Mars 2015 18:17:27 
>> Objet:  Strange krbd behaviour with queue depths 
>> 
>> 
>> 
>> I’m seeing a strange queue depth behaviour with a kernel mapped RBD, 
>> librbd does not show this problem. 
>> 
>> 
>> 
>> Cluster is comprised of 4 nodes, 10GB networking, not including OSDs as test 
>> sample is small so fits in page cache. 
>> 
>> 
>> 
>> Running fio against a kernel mapped RBD 
>> 
>> fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test 
>> --filename=/dev/rbd/cache1/test2 --bs=4k --readwrite=randread -- 
>> iodepth=1 --runtime=10 --size=1g 
>> 
>> 
>> 
>> Queue Depth 
>> 
>> IOPS 
>> 
>> 
>> 1 
>> 
>> 2021 
>> 
>> 
>> 2 
>> 
>> 288 
>> 
>> 
>> 4 
>> 
>> 376 
>> 
>> 
>> 8 
>> 
>> 601 
>> 
>> 
>> 16 
>> 
>> 1272 
>> 
>> 
>> 32 
>> 
>> 2467 
>> 
>> 
>> 64 
>> 
>> 16901 
>> 
>> 
>> 128 
>> 
>> 44060 
>> 
>> 
>> 
>> See how initially I get a very high number of IOs at queue depth 1, but this 
>> drops dramatically as soon as I start increasing the queue depth. It’s not until 
>> a depth or 32 IOs that I start to get similar performance. Incidentally when 
>> changing the read type to sequential instead of random the oddity goes 
>> away. 
>> 
>> 
>> 
>> Running fio with librbd engine and the same test options I get the following 
>> 
>> 
>> 
>> Queue Depth 
>> 
>> IOPS 
>> 
>> 
>> 1 
>> 
>> 1492 
>> 
>> 
>> 2 
>> 
>> 3232 
>> 
>> 
>> 4 
>> 
>> 7099 
>> 
>> 
>> 8 
>> 
>> 13875 
>> 
>> 
>> 16 
>> 
>> 18759 
>> 
>> 
>> 32 
>> 
>> 17998 
>> 
>> 
>> 64 
>> 
>> 18104 
>> 
>> 
>> 128 
>> 
>> 18589 
>> 
>> 
>> 
>> 
>> 
>> As you can see the performance scales up nicely, although the top end IO’s 
>> seem limited to around 18k. I don’t know if this is due to kernel/userspace 
>> performance differences or if there is a lower max queue depth limit in 
>> librbd. 
>> 
>> 
>> 
>> Both tests were run on a small sample size to force the OSD data into page 
>> cache to rule out any device latency. 
>> 
>> 
>> 
>> Does anyone know why kernel mapped RBD’s show this weird behaviour? I 
>> don’t think it can be OSD/ceph config related as it only happens with krbd’s. 
>> 
>> 
>> 
>> Nick 
>> 
>> 
>> 
>> 
>> _______________________________________________ 
>> ceph-users mailing list 
>> ceph-users@xxxxxxxxxxxxxx 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> 
>> _______________________________________________ 
>> ceph-users mailing list 
>> ceph-users@xxxxxxxxxxxxxx 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> 
>> ________________________________ 
>> 
>> PLEASE NOTE: The information contained in this electronic mail message is 
>> intended only for the use of the designated recipient(s) named above. If the 
>> reader of this message is not the intended recipient, you are hereby notified 
>> that you have received this message in error and that any review, 
>> dissemination, distribution, or copying of this message is strictly prohibited. If 
>> you have received this communication in error, please notify the sender by 
>> telephone or e-mail (as shown above) immediately and destroy any and all 
>> copies of this message in your possession (whether hard copies or 
>> electronically stored copies). 
>> 
>> _______________________________________________ 
>> ceph-users mailing list 
>> ceph-users@xxxxxxxxxxxxxx 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
> 
> 
> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com