Re: Mapped rbd is very slow

"Mike O'Connor" <mike@xxxxxxxxxx> · Sat, 17 Aug 2019 09:40:54 +0930

This probably muddies the water. Note Active cluster with around 22
read/write IOPS and 200kB read/write

A CephFS mounted with 3 hosts 6 osd per host with 8G public and 10G
private networking for Ceph.
No SSDs and mostly WD Red 1T 2.5" drives some are HGST 1T 7200.

root@blade7:~# fio -ioengine=libaio -name=test -bs=4k -iodepth=32
-rw=randwrite -direct=1 -runtime=60 -filename=/mnt/pve/cephfs/test
test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T)
4096B-4096B, ioengine=libaio, iodepth=32
fio-3.12
Starting 1 process
test: you need to specify size=
fio: pid=0, err=22/file:filesetup.c:952, func=total_file_size,
error=Invalid argument

Run status group 0 (all jobs):
root@blade7:~# fio -ioengine=libaio -name=test -bs=4k -iodepth=32
-rw=randwrite -direct=1 -runtime=60 -size=10G -filename=/mnt/pve/cephfs/test
test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T)
4096B-4096B, ioengine=libaio, iodepth=32
fio-3.12
Starting 1 process
test: Laying out IO file (1 file / 10240MiB)
Jobs: 1 (f=0): [f(1)][100.0%][w=580KiB/s][w=145 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=3561674: Sat Aug 17 09:20:22 2019
  write: IOPS=2262, BW=9051KiB/s (9268kB/s)(538MiB/60845msec); 0 zone resets
    slat (usec): min=8, max=35648, avg=40.01, stdev=97.51
    clat (usec): min=954, max=2854.3k, avg=14090.15, stdev=100194.83
     lat (usec): min=994, max=2854.3k, avg=14130.65, stdev=100195.40
    clat percentiles (usec):
     |  1.00th=[   1254],  5.00th=[   1450], 10.00th=[   1582],
     | 20.00th=[   1795], 30.00th=[   2008], 40.00th=[   2245],
     | 50.00th=[   2540], 60.00th=[   2933], 70.00th=[   3392],
     | 80.00th=[   4228], 90.00th=[   7767], 95.00th=[  35914],
     | 99.00th=[ 254804], 99.50th=[ 616563], 99.90th=[1652556],
     | 99.95th=[2122318], 99.99th=[2600469]
   bw (  KiB/s): min=   48, max=44408, per=100.00%, avg=10387.54,
stdev=10384.94, samples=106
   iops        : min=   12, max=11102, avg=2596.88, stdev=2596.23,
samples=106
  lat (usec)   : 1000=0.01%
  lat (msec)   : 2=29.82%, 4=47.95%, 10=14.23%, 20=2.43%, 50=1.34%
  lat (msec)   : 100=2.68%, 250=0.53%, 500=0.40%, 750=0.20%, 1000=0.14%
  cpu          : usr=1.45%, sys=6.36%, ctx=151946, majf=0, minf=280
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%,
>=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%,
>=64=0.0%
     issued rwts: total=0,137674,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: bw=9051KiB/s (9268kB/s), 9051KiB/s-9051KiB/s
(9268kB/s-9268kB/s), io=538MiB (564MB), run=60845-60845msec

This is on the same system with a RBD mapped file system

root@blade7:/mnt# fio -ioengine=libaio -name=test -bs=4k -iodepth=32
-rw=randwrite -direct=1 -runtime=60 -size=10G -filename=/mnt/image0/test
test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T)
4096B-4096B, ioengine=libaio, iodepth=32
fio-3.12
Starting 1 process
test: Laying out IO file (1 file / 10240MiB)
Jobs: 1 (f=1): [w(1)][4.5%][w=4KiB/s][w=1 IOPS][eta 21m:30s]
test: (groupid=0, jobs=1): err= 0: pid=3567399: Sat Aug 17 09:38:55 2019
  write: IOPS=1935, BW=7744KiB/s (7930kB/s)(462MiB/61143msec); 0 zone resets
    slat (usec): min=9, max=700161, avg=65.17, stdev=2092.54
    clat (usec): min=954, max=2578.6k, avg=16457.67, stdev=109995.03
     lat (usec): min=1021, max=2578.6k, avg=16523.42, stdev=110014.91
    clat percentiles (usec):
     |  1.00th=[   1254],  5.00th=[   1434], 10.00th=[   1549],
     | 20.00th=[   1745], 30.00th=[   1909], 40.00th=[   2114],
     | 50.00th=[   2376], 60.00th=[   2704], 70.00th=[   3228],
     | 80.00th=[   4080], 90.00th=[   8717], 95.00th=[  53216],
     | 99.00th=[ 291505], 99.50th=[ 675283], 99.90th=[1669333],
     | 99.95th=[2231370], 99.99th=[2365588]
   bw (  KiB/s): min=    8, max=35968, per=100.00%, avg=9015.64,
stdev=8402.84, samples=105
   iops        : min=    2, max= 8992, avg=2253.90, stdev=2100.72,
samples=105
  lat (usec)   : 1000=0.01%
  lat (msec)   : 2=34.85%, 4=44.49%, 10=11.54%, 20=1.84%, 50=1.81%
  lat (msec)   : 100=3.27%, 250=1.13%, 500=0.42%, 750=0.19%, 1000=0.08%
  cpu          : usr=1.42%, sys=6.63%, ctx=123309, majf=0, minf=283
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%,
>=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%,
>=64=0.0%
     issued rwts: total=0,118371,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: bw=7744KiB/s (7930kB/s), 7744KiB/s-7744KiB/s
(7930kB/s-7930kB/s), io=462MiB (485MB), run=61143-61143msec

Disk stats (read/write):
  rbd0: ios=0/118670, merge=0/9674, ticks=0/1894238, in_queue=1651008,
util=33.33%

On 17/8/19 8:46 am, Olivier AUDRY wrote:
> Write and read with 2 hosts 4 osd :
>
> mkfs.ext4 /dev/rbd/kube/bench
> mount /dev/rbd/kube/bench /mnt/
> dd if=/dev/zero of=test bs=8192k count=1000 oflag=direct
> 8388608000 bytes (8.4 GB, 7.8 GiB) copied, 117.541 s, 71.4 MB/s
>
> fio -ioengine=libaio -name=test -bs=4k -iodepth=32 -rw=randwrite
> -direct=1 -runtime=60 -filename=/dev/rbd/kube/bench
> WRITE: bw=45.3MiB/s (47.5MB/s), 45.3MiB/s-45.3MiB/s (47.5MB/s-
> 47.5MB/s), io=2718MiB (2850MB), run=60003-60003msec
>
> fio -ioengine=libaio -name=test -bs=4k -iodepth=32 -rw=randread
> -direct=1 -runtime=60 -filename=/dev/rbd/kube/bench
> READ: bw=187MiB/s (197MB/s), 187MiB/s-187MiB/s (197MB/s-197MB/s),
> io=10.0GiB (10.7GB), run=54636-54636msec
>
> pgbench before : 10 transaction per second
> pgbench after : 355 transaction per second
>
> So yes it's better. SSD are INTEL SSDSC2BB48 0370.
>
>
>
> Le samedi 17 août 2019 à 01:55 +0300, vitalif@xxxxxxxxxx a écrit :
>>> on a new ceph cluster with the same software and config (ansible)
>>> on
>>> the old hardware. 2 replica, 1 host, 4 osd.
>>>
>>> => New hardware : 32.6MB/s READ / 10.5MiB WRITE
>>> => Old hardware : 184MiB/s READ / 46.9MiB WRITE
>>>
>>> No discussion ? I suppose I will keep the old hardware. What do you
>>> think ? :D
>> In fact I don't really believe in 184 MB/s random reads with Ceph
>> with 4 
>> OSDs, it's a very cool result if it's true.
>>
>> Does the "new cluster on the old hardware" consist of only 1 host?
>> Did 
>> you test reads before you actually wrote anything into the image so
>> it 
>> was empty and reads were fast because of that?
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx