Mapped RBD is too slow?

<Michal.Plsek@xxxxxxxxx> · Fri, 19 Jun 2020 15:45:02 +0200 (CEST)

Good day,

rw operations (randwrite 4kB and 4MB) over mapped RBD are just too slow. I 
am also using librbd over TGT.

fio input:

[global]
rw=randwrite
ioengine=libaio
iodepth=64
size=1g
direct=1
buffered=0
startdelay=5
group_reporting=1
thread=1
ramp_time=5
time_based
disk_util=0
clat_percentiles=0
disable_lat=1
disable_clat=1
disable_slat=1
#numjobs=16
runtime=60
filename=/mnt/disk/test1/testfile.fio
[test]
name=test
bs=4k
stonewall

fio output for TGT (librbd):

test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-
4096B, ioengine=libaio, iodepth=64
Starting 1 thread
test: Laying out IO files (2 files / total 1024MiB)
test: (groupid=0, jobs=1): err= 0: pid=6909: Fri Jun 19 15:26:11 2020
  write: IOPS=6342, BW=24.8MiB/s (25.0MB/s)(1487MiB/60003msec)
   bw (  KiB/s): min=    8, max=70216, per=100.00%, avg=30441.30, stdev=
28899.02, samples=100
   iops        : min=    2, max=17554, avg=7610.27, stdev=7224.76, samples=
100
  cpu          : usr=2.18%, sys=11.08%, ctx=107852, majf=0, minf=356
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=
115.5%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=
0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=
0.0%
     issued rwts: total=0,380583,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
  WRITE: bw=24.8MiB/s (25.0MB/s), 24.8MiB/s-24.8MiB/s (25.0MB/s-25.0MB/s), 
io=1487MiB (1559MB), run=60003-60003msec

-----------------

fio output for RBD:

test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-
4096B, ioengine=libaio, iodepth=64
Starting 1 thread
test: (groupid=0, jobs=1): err= 0: pid=7372: Fri Jun 19 15:33:51 2020
  write: IOPS=909, BW=3642KiB/s (3729kB/s)(214MiB/60186msec)
   bw (  KiB/s): min= 2792, max= 4688, per=100.00%, avg=3648.13, stdev=
399.09, samples=120
   iops        : min=  698, max= 1172, avg=912.01, stdev=99.75, samples=120
  cpu          : usr=0.78%, sys=3.08%, ctx=37108, majf=0, minf=267
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=
110.2%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=
0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=
0.0%
     issued rwts: total=0,54732,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
  WRITE: bw=3642KiB/s (3729kB/s), 3642KiB/s-3642KiB/s (3729kB/s-3729kB/s), 
io=214MiB (224MB), run=60186-60186msec

-----------------

I ran these tests from separate client server. I suspect that RBD is not 
working correctly in there, since I tried some fio tests on it and the 
result was almost the same with RBD cache set to false in ceph.conf (for 
example: fio -ioengine=libaio -name=test -bs=4M -iodepth=64 -numjobs=16 -rw=
randwrite -direct=1 -runtime=60 -filename=/mnt/disk/test1 -size=10g).

Can you give me any ideas where the problem might be, perhaps with RBD 
cache? Network capacity and usual things has been tested already. I will be 
able to provide more specs if needed.

Thanks!
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx