Re: bad perf for librbd vs krbd using FIO

Somnath Roy <Somnath.Roy@xxxxxxxxxxx> · Fri, 11 Sep 2015 03:39:10 +0000

It may be due to rbd cache effect..
Try the following..

Run your test with direct = 1 both the cases and rbd_cache = false  (disable all other rbd cache option as well). This should give you similar result like krbd.

In direct =1 case, we saw ~10-20% degradation if we make rbd_cache = true.
But, direct = 0 case, it could be more as you are seeing..

I think there is a delta (or need to tune properly) if you want to use rbd cache.

Thanks & Regards
Somnath

From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx]
On Behalf Of Rafael Lopez

Sent: Thursday, September 10, 2015 8:24 PM

To: ceph-users@xxxxxxxxxxxxxx

Subject: [ceph-users] bad perf for librbd vs krbd using FIO

Hi all,

I am seeing a big discrepancy between librbd and kRBD/ext4 performance using FIO with single RBD image. RBD images are coming from same RBD pool, same size and settings for both. The librbd results are quite bad by comparison, and in addition
 if I scale up the kRBD FIO job with more jobs/threads it increases up to 3-4x results below, but librbd doesn't seem to scale much at all. I figured that it should be close to the kRBD result for a single job/thread before parallelism comes into play though.
 RBD cache settings are all default.

I can see some obvious differences in FIO output, but not being well versed with FIO I'm not sure what to make of it or where to start diagnosing the discrepancy. Hunted around but haven't found anything useful, any suggestions/insights
 would be appreciated. 

RBD cache settings:

[root@rcmktdc1r72-09-ac rafaell]# ceph --admin-daemon /var/run/ceph/ceph-osd.659.asok config show | grep rbd_cache

    "rbd_cache": "true",

    "rbd_cache_writethrough_until_flush": "true",

    "rbd_cache_size": "33554432",

    "rbd_cache_max_dirty": "25165824",

    "rbd_cache_target_dirty": "16777216",

    "rbd_cache_max_dirty_age": "1",

    "rbd_cache_max_dirty_object": "0",

    "rbd_cache_block_writes_upfront": "false",

[root@rcmktdc1r72-09-ac rafaell]#

This is the FIO job file for the kRBD job:

[root@rcprsdc1r72-01-ac rafaell]# cat ext4_test

; -- start job file --

[global]

rw=rw

size=100g

filename=/mnt/rbd/fio_test_file_ext4

rwmixread=0

rwmixwrite=100

percentage_random=0

bs=1024k

direct=0

iodepth=16

thread=1

numjobs=1

[job1]

; -- end job file --

[root@rcprsdc1r72-01-ac rafaell]#

This is the FIO job file for the librbd job:

[root@rcprsdc1r72-01-ac rafaell]# cat fio_rbd_test 

; -- start job file --

[global]

rw=rw

size=100g

rwmixread=0

rwmixwrite=100

percentage_random=0

bs=1024k

direct=0

iodepth=16

thread=1

numjobs=1

ioengine=rbd

rbdname=nas1-rds-stg31

pool=rbd

[job1]

; -- end job file --

Here are the results:

[root@rcprsdc1r72-01-ac rafaell]# fio ext4_test

job1: (g=0): rw=rw, bs=1M-1M/1M-1M/1M-1M, ioengine=sync, iodepth=16

fio-2.2.8

Starting 1 thread

job1: Laying out IO file(s) (1 file(s) / 102400MB)

Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/321.7MB/0KB /s] [0/321/0 iops] [eta 00m:00s]        

job1: (groupid=0, jobs=1): err= 0: pid=37981: Fri Sep 11 12:33:13 2015

  write: io=102400MB, bw=399741KB/s, iops=390, runt=262314msec

    clat (usec): min=411, max=574082, avg=2492.91, stdev=7316.96

     lat (usec): min=418, max=574113, avg=2520.12, stdev=7318.53

    clat percentiles (usec):

     |  1.00th=[  446],  5.00th=[  458], 10.00th=[  474], 20.00th=[  510],

     | 30.00th=[ 1064], 40.00th=[ 1096], 50.00th=[ 1160], 60.00th=[ 1320],

     | 70.00th=[ 1592], 80.00th=[ 2448], 90.00th=[ 7712], 95.00th=[ 7904],

     | 99.00th=[11072], 99.50th=[11712], 99.90th=[13120], 99.95th=[73216],

     | 99.99th=[464896]

    bw (KB  /s): min=  264, max=2156544, per=100.00%, avg=412986.27, stdev=375092.66

    lat (usec) : 500=18.68%, 750=7.43%, 1000=2.11%

    lat (msec) : 2=48.89%, 4=4.35%, 10=16.79%, 20=1.67%, 50=0.03%

    lat (msec) : 100=0.03%, 250=0.02%, 500=0.01%, 750=0.01%

  cpu          : usr=1.24%, sys=45.38%, ctx=19298, majf=0, minf=974

  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%

     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%

     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%

     issued    : total=r=0/w=102400/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0

     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):

  WRITE: io=102400MB, aggrb=399740KB/s, minb=399740KB/s, maxb=399740KB/s, mint=262314msec, maxt=262314msec

Disk stats (read/write):

  rbd0: ios=0/150890, merge=0/49, ticks=0/36117700, in_queue=36145277, util=96.97%

[root@rcprsdc1r72-01-ac rafaell]#

[root@rcprsdc1r72-01-ac rafaell]# fio fio_rbd_test 

job1: (g=0): rw=rw, bs=1M-1M/1M-1M/1M-1M, ioengine=rbd, iodepth=16

fio-2.2.8

Starting 1 thread

rbd engine: RBD version: 0.1.9

Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/65405KB/0KB /s] [0/63/0 iops] [eta 00m:00s] 

job1: (groupid=0, jobs=1): err= 0: pid=43960: Fri Sep 11 12:54:25 2015

  write: io=102400MB, bw=121882KB/s, iops=119, runt=860318msec

    slat (usec): min=355, max=7300, avg=908.97, stdev=361.02

    clat (msec): min=11, max=1468, avg=129.59, stdev=130.68

     lat (msec): min=12, max=1468, avg=130.50, stdev=130.69

    clat percentiles (msec):

     |  1.00th=[   21],  5.00th=[   26], 10.00th=[   29], 20.00th=[   34],

     | 30.00th=[   37], 40.00th=[   40], 50.00th=[   44], 60.00th=[   63],

     | 70.00th=[  233], 80.00th=[  241], 90.00th=[  269], 95.00th=[  367],

     | 99.00th=[  553], 99.50th=[  652], 99.90th=[  832], 99.95th=[  848],

     | 99.99th=[ 1369]

    bw (KB  /s): min=20363, max=248543, per=100.00%, avg=124381.19, stdev=42313.29

    lat (msec) : 20=0.95%, 50=55.27%, 100=5.55%, 250=24.83%, 500=12.28%

    lat (msec) : 750=0.89%, 1000=0.21%, 2000=0.01%

  cpu          : usr=9.58%, sys=1.15%, ctx=23883, majf=0, minf=2751023

  IO depths    : 1=1.2%, 2=3.0%, 4=9.7%, 8=68.3%, 16=17.8%, 32=0.0%, >=64=0.0%

     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%

     complete  : 0=0.0%, 4=92.5%, 8=4.3%, 16=3.2%, 32=0.0%, 64=0.0%, >=64=0.0%

     issued    : total=r=0/w=102400/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0

     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):

  WRITE: io=102400MB, aggrb=121882KB/s, minb=121882KB/s, maxb=121882KB/s, mint=860318msec, maxt=860318msec

Disk stats (read/write):

    dm-1: ios=0/2072, merge=0/0, ticks=0/233, in_queue=233, util=0.01%, aggrios=1/2249, aggrmerge=7/559, aggrticks=9/254, aggrin_queue=261, aggrutil=0.01%

  sda: ios=1/2249, merge=7/559, ticks=9/254, in_queue=261, util=0.01%

[root@rcprsdc1r72-01-ac rafaell]#

Cheers,

Raf

-- 

Rafael Lopez

Data Storage Administrator

Servers & Storage (eSolutions)

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this
 message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy
 any and all copies of this message in your possession (whether hard copies or electronically stored copies).

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com