Hello, On Fri, 11 Sep 2015 13:24:24 +1000 Rafael Lopez wrote: > Hi all, > > I am seeing a big discrepancy between librbd and kRBD/ext4 performance > using FIO with single RBD image. RBD images are coming from same RBD > pool, same size and settings for both. The librbd results are quite bad > by comparison, and in addition if I scale up the kRBD FIO job with more > jobs/threads it increases up to 3-4x results below, but librbd doesn't > seem to scale much at all. I figured that it should be close to the kRBD > result for a single job/thread before parallelism comes into play > though. RBD cache settings are all default. > librbd as in FUSE or to KVM client VM? RBD cache settings only influence librbd, the kernel will use all of the available memory for page cache. And this what you're probably seeing, with the kernel RBD being so much faster. Anyway, a good comparison and idea of what your cluster can do would be firstly with a blocksize of 4KB (smaller total size of course) and direct=1. Christian > I can see some obvious differences in FIO output, but not being well > versed with FIO I'm not sure what to make of it or where to start > diagnosing the discrepancy. Hunted around but haven't found anything > useful, any suggestions/insights would be appreciated. > > RBD cache settings: > [root@rcmktdc1r72-09-ac rafaell]# ceph --admin-daemon > /var/run/ceph/ceph-osd.659.asok config show | grep rbd_cache > "rbd_cache": "true", > "rbd_cache_writethrough_until_flush": "true", > "rbd_cache_size": "33554432", > "rbd_cache_max_dirty": "25165824", > "rbd_cache_target_dirty": "16777216", > "rbd_cache_max_dirty_age": "1", > "rbd_cache_max_dirty_object": "0", > "rbd_cache_block_writes_upfront": "false", > [root@rcmktdc1r72-09-ac rafaell]# > > This is the FIO job file for the kRBD job: > > [root@rcprsdc1r72-01-ac rafaell]# cat ext4_test > ; -- start job file -- > [global] > rw=rw > size=100g > filename=/mnt/rbd/fio_test_file_ext4 > rwmixread=0 > rwmixwrite=100 > percentage_random=0 > bs=1024k > direct=0 > iodepth=16 > thread=1 > numjobs=1 > [job1] > ; -- end job file -- > > [root@rcprsdc1r72-01-ac rafaell]# > > This is the FIO job file for the librbd job: > > [root@rcprsdc1r72-01-ac rafaell]# cat fio_rbd_test > ; -- start job file -- > [global] > rw=rw > size=100g > rwmixread=0 > rwmixwrite=100 > percentage_random=0 > bs=1024k > direct=0 > iodepth=16 > thread=1 > numjobs=1 > ioengine=rbd > rbdname=nas1-rds-stg31 > pool=rbd > [job1] > ; -- end job file -- > > > Here are the results: > > [root@rcprsdc1r72-01-ac rafaell]# fio ext4_test > job1: (g=0): rw=rw, bs=1M-1M/1M-1M/1M-1M, ioengine=sync, iodepth=16 > fio-2.2.8 > Starting 1 thread > job1: Laying out IO file(s) (1 file(s) / 102400MB) > Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/321.7MB/0KB /s] [0/321/0 iops] > [eta 00m:00s] > job1: (groupid=0, jobs=1): err= 0: pid=37981: Fri Sep 11 12:33:13 2015 > write: io=102400MB, bw=399741KB/s, iops=390, runt=262314msec > clat (usec): min=411, max=574082, avg=2492.91, stdev=7316.96 > lat (usec): min=418, max=574113, avg=2520.12, stdev=7318.53 > clat percentiles (usec): > | 1.00th=[ 446], 5.00th=[ 458], 10.00th=[ 474], > 20.00th=[ 510], | 30.00th=[ 1064], 40.00th=[ 1096], 50.00th=[ 1160], > 60.00th=[ 1320], | 70.00th=[ 1592], 80.00th=[ 2448], 90.00th=[ 7712], > 95.00th=[ 7904], | 99.00th=[11072], 99.50th=[11712], 99.90th=[13120], > 99.95th=[73216], | 99.99th=[464896] > bw (KB /s): min= 264, max=2156544, per=100.00%, avg=412986.27, > stdev=375092.66 > lat (usec) : 500=18.68%, 750=7.43%, 1000=2.11% > lat (msec) : 2=48.89%, 4=4.35%, 10=16.79%, 20=1.67%, 50=0.03% > lat (msec) : 100=0.03%, 250=0.02%, 500=0.01%, 750=0.01% > cpu : usr=1.24%, sys=45.38%, ctx=19298, majf=0, minf=974 > IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, > >=64=0.0% > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, > >=64=0.0% > complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, > >=64=0.0% > issued : total=r=0/w=102400/d=0, short=r=0/w=0/d=0, > drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, > depth=16 > > Run status group 0 (all jobs): > WRITE: io=102400MB, aggrb=399740KB/s, minb=399740KB/s, maxb=399740KB/s, > mint=262314msec, maxt=262314msec > > Disk stats (read/write): > rbd0: ios=0/150890, merge=0/49, ticks=0/36117700, in_queue=36145277, > util=96.97% > [root@rcprsdc1r72-01-ac rafaell]# > > [root@rcprsdc1r72-01-ac rafaell]# fio fio_rbd_test > job1: (g=0): rw=rw, bs=1M-1M/1M-1M/1M-1M, ioengine=rbd, iodepth=16 > fio-2.2.8 > Starting 1 thread > rbd engine: RBD version: 0.1.9 > Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/65405KB/0KB /s] [0/63/0 iops] > [eta 00m:00s] > job1: (groupid=0, jobs=1): err= 0: pid=43960: Fri Sep 11 12:54:25 2015 > write: io=102400MB, bw=121882KB/s, iops=119, runt=860318msec > slat (usec): min=355, max=7300, avg=908.97, stdev=361.02 > clat (msec): min=11, max=1468, avg=129.59, stdev=130.68 > lat (msec): min=12, max=1468, avg=130.50, stdev=130.69 > clat percentiles (msec): > | 1.00th=[ 21], 5.00th=[ 26], 10.00th=[ 29], > 20.00th=[ 34], | 30.00th=[ 37], 40.00th=[ 40], 50.00th=[ 44], > 60.00th=[ 63], | 70.00th=[ 233], 80.00th=[ 241], 90.00th=[ 269], > 95.00th=[ 367], | 99.00th=[ 553], 99.50th=[ 652], 99.90th=[ 832], > 99.95th=[ 848], | 99.99th=[ 1369] > bw (KB /s): min=20363, max=248543, per=100.00%, avg=124381.19, > stdev=42313.29 > lat (msec) : 20=0.95%, 50=55.27%, 100=5.55%, 250=24.83%, 500=12.28% > lat (msec) : 750=0.89%, 1000=0.21%, 2000=0.01% > cpu : usr=9.58%, sys=1.15%, ctx=23883, majf=0, minf=2751023 > IO depths : 1=1.2%, 2=3.0%, 4=9.7%, 8=68.3%, 16=17.8%, 32=0.0%, > >=64=0.0% > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, > >=64=0.0% > complete : 0=0.0%, 4=92.5%, 8=4.3%, 16=3.2%, 32=0.0%, 64=0.0%, > >=64=0.0% > issued : total=r=0/w=102400/d=0, short=r=0/w=0/d=0, > drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, > depth=16 > > Run status group 0 (all jobs): > WRITE: io=102400MB, aggrb=121882KB/s, minb=121882KB/s, maxb=121882KB/s, > mint=860318msec, maxt=860318msec > > Disk stats (read/write): > dm-1: ios=0/2072, merge=0/0, ticks=0/233, in_queue=233, util=0.01%, > aggrios=1/2249, aggrmerge=7/559, aggrticks=9/254, aggrin_queue=261, > aggrutil=0.01% > sda: ios=1/2249, merge=7/559, ticks=9/254, in_queue=261, util=0.01% > [root@rcprsdc1r72-01-ac rafaell]# > > Cheers, > Raf > > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Fusion Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com