Slow benchmarks for rbd vs. rados bench

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

during deployment of a small testcluster on two existing servers I got multiple benchmark results I don't understand.

In summary, I think the rados benchmark values are okay for my limited setup - I got ~80 MiB/s for write, ~300 MiB/s for reads. But for rbd benchmarks, the values dropped down to ~46 MiB/s for writes and 71 MiB/s for reads. Have I missed something for rbd bench? I don't understand why rbd is so much slower than rados.

Hint: the productive cluster should get 3 servers total and each server 6x 12TB SAS3 disks and maybe 2 additional servers for up to 5 monitors. I know Ceph makes more fun with > 3, but budget ist limited :(

Here are the results of all benches:

# rados bench -p testbench 10 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects
Object prefix: benchmark_data_virt-master4_1711455
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s) avg lat(s)
    0       0         0         0         0         0 -           0
    1      16        31        15   59.9956        60 0.430283    0.622336
    2      16        51        35   69.9923        80 0.565152    0.635417
    3      16        74        58   77.3241        92 0.746055    0.663046
    4      16        97        81     80.99        92 0.558241    0.646831
    5      16       122       106   84.7893       100 0.705829    0.686034
    6      16       149       133   88.6553       108 0.496865    0.674404
    7      16       170       154   87.9886        84 0.916755    0.674053
    8      16       189       173   86.4887        76 4.24615    0.700781
    9      16       215       199   88.4327       104 0.60869    0.702643
   10      16       235       219    87.588        80 0.882353    0.696558
   11       1       235       234   85.0792        60 0.666018    0.692935
Total time run:         11.8131
Total writes made:      235
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     79.5726
Stddev Bandwidth:       16.1087
Max bandwidth (MB/sec): 108
Min bandwidth (MB/sec): 60
Average IOPS:           19
Stddev IOPS:            4.02718
Max IOPS:               27
Min IOPS:               15
Average Latency(s):     0.708564
Stddev Latency(s):      0.440743
Max latency(s):         4.36566
Min latency(s):         0.318937

# rados bench -p testbench 10 seq
hints = 1
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s) avg lat(s)
    0       0         0         0         0         0 -           0
    1      16        87        71   283.925       284 0.354899    0.187907
    2      16       161       145   289.941       296 0.122009    0.194255
    3       5       235       230   306.611       340 0.231543    0.198982
Total time run:       3.08835
Total reads made:     235
Read size:            4194304
Object size:          4194304
Bandwidth (MB/sec):   304.37
Average IOPS:         76
Stddev IOPS:          7.37111
Max IOPS:             85
Min IOPS:             71
Average Latency(s):   0.200212
Max latency(s):       1.87814
Min latency(s):       0.017529

# rbd bench --io-type write image01 --pool=testbench
bench  type write io_size 4096 io_threads 16 bytes 1073741824 pattern sequential
  SEC       OPS   OPS/SEC   BYTES/SEC
    1     11520   11536.2    45 MiB/s
    2     23984   12000.2    47 MiB/s
    3     38304   12739.6    50 MiB/s
    4     52672   13028.9    51 MiB/s
    5     63616   12575.7    49 MiB/s
    6     71456   11987.4    47 MiB/s
    7     82464   11677.5    46 MiB/s
    8     96080   11573.9    45 MiB/s
    9    110080   11436.1    45 MiB/s
   10    124512   12327.4    48 MiB/s
   11    136496   13008.2    51 MiB/s
   12    148160   13118.5    51 MiB/s
   13    163216     13374    52 MiB/s
   14    176848     13527    53 MiB/s
   15    191120   13132.7    51 MiB/s
   16    203376   13376.3    52 MiB/s
   17    212576   12924.8    50 MiB/s
   18    221120   11534.9    45 MiB/s
   19    228032   10196.2    40 MiB/s
   20    241584   10240.5    40 MiB/s
   21    254176   10095.6    39 MiB/s
elapsed: 22   ops: 262144   ops/sec: 11896.4   bytes/sec: 46 MiB/s

~# rbd bench --io-type read image01 --pool=testbench
bench  type read io_size 4096 io_threads 16 bytes 1073741824 pattern sequential
  SEC       OPS   OPS/SEC   BYTES/SEC
    1     18368   18384.4    72 MiB/s
    2     36032   18024.3    70 MiB/s
    3     56768   18928.4    74 MiB/s
    4     75472   18872.4    74 MiB/s
    5     93264   18656.4    73 MiB/s
    6    109232   18173.1    71 MiB/s
    7    128320     18458    72 MiB/s
    8    146336   17913.9    70 MiB/s
    9    164176   17741.1    69 MiB/s
   10    184720   18291.6    71 MiB/s
   11    202416   18637.2    73 MiB/s
   12    216688   17673.9    69 MiB/s
   13    236464   18025.9    70 MiB/s
   14    255776   18320.4    72 MiB/s
elapsed: 14   ops: 262144   ops/sec: 18255.5   bytes/sec: 71 MiB/s


Some cluster details (per server):

 * SAS3 SEAGATE ST12000NM004J
 * Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
 * Dual NIC Intel E810 (LACP, 2x10GBit Link)
 * LSI SAS3004 PCI-Express Fusion-MPT SAS-3
 * 192 GB ECC RAM

Ceph Squid deployed via cephadm.

~# ceph -s
  cluster:
    id:     7644057a-00f6-11f0-9a0c-eac00fed9338
    health: HEALTH_WARN
            1 pool(s) do not have an application enabled

  services:
    mon: 2 daemons, quorum virt-master4,virt-master3 (age 24h)
    mgr: virt-master4.lddrxr(active, since 24h), standbys: virt-master3.akkopo
    osd: 2 osds: 2 up (since 3h), 2 in (since 3h)

  data:
    pools:   3 pools, 65 pgs
    objects: 3.43k objects, 13 GiB
    usage:   27 GiB used, 22 TiB / 22 TiB avail
    pgs:     65 active+clean


~# ceph osd tree
ID  CLASS  WEIGHT    TYPE NAME              STATUS  REWEIGHT PRI-AFF
-1         21.82808  root default
-5         10.91399      host virt-master3
 1    hdd  10.91399          osd.1              up   1.00000 1.00000
-3         10.91409      host virt-master4
 0    hdd  10.91409          osd.0              up   1.00000 1.00000


# ceph osd pool ls detail
pool 1 '.mgr' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 17 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr read_balance_score 2.00 pool 2 'kvm' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 33 lfor 0/0/28 flags hashpspool,selfmanaged_snaps stripe_width 0 compression_algorithm snappy compression_mode aggressive application rbd read_balance_score 1.06 pool 3 'testbench' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 446 lfor 0/443/441 flags hashpspool,selfmanaged_snaps stripe_width 0 read_balance_score 1.06

osd.0 direct benchmark with 4M and 4K blocksize:

root@virt-master4:~# fio --ioengine=sync --filename=/dev/disk/by-id/scsi-35000c500f8bfda83 --direct=1 --sync=1 --rw=write --bs=4M --numjobs=1 --iodepth=1 --runtime=60 --time_based --name=fio fio: (g=0): rw=write, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=sync, iodepth=1
fio-3.33
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][w=172MiB/s][w=43 IOPS][eta 00m:00s]
fio: (groupid=0, jobs=1): err= 0: pid=1585455: Sat Mar 15 14:29:21 2025
  write: IOPS=42, BW=170MiB/s (178MB/s)(9.97GiB/60013msec); 0 zone resets
    clat (usec): min=21993, max=40219, avg=23274.37, stdev=963.46
     lat (usec): min=22341, max=40453, avg=23511.77, stdev=961.26
    clat percentiles (usec):
     |  1.00th=[22152],  5.00th=[22152], 10.00th=[22414], 20.00th=[22676],
     | 30.00th=[22938], 40.00th=[22938], 50.00th=[23200], 60.00th=[23200],
     | 70.00th=[23462], 80.00th=[23462], 90.00th=[24249], 95.00th=[24511],
     | 99.00th=[24773], 99.50th=[31327], 99.90th=[32637], 99.95th=[32637],
     | 99.99th=[40109]
   bw (  KiB/s): min=163840, max=180224, per=99.99%, avg=174166.05, stdev=3764.97, samples=119
   iops        : min=   40, max=   44, avg=42.52, stdev= 0.92, samples=119
  lat (msec)   : 50=100.00%
  cpu          : usr=1.00%, sys=2.05%, ctx=7648, majf=0, minf=11
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,2552,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=170MiB/s (178MB/s), 170MiB/s-170MiB/s (178MB/s-178MB/s), io=9.97GiB (10.7GB), run=60013-60013msec


root@virt-master4:~# fio --ioengine=sync --filename=/dev/disk/by-id/scsi-35000c500f8bfda83 --direct=1 --sync=1 --rw=randwrite --bs=4M --numjobs=1 --iodepth=1 --runtime=60 --time_based --name=fio fio: (g=0): rw=randwrite, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=sync, iodepth=1
fio-3.33
Starting 1 process
Jobs: 1 (f=1): [w(1)][100.0%][w=124MiB/s][w=31 IOPS][eta 00m:00s]
fio: (groupid=0, jobs=1): err= 0: pid=1588408: Sat Mar 15 14:37:13 2025
  write: IOPS=30, BW=123MiB/s (129MB/s)(7360MiB/60028msec); 0 zone resets
    clat (usec): min=19173, max=77481, avg=32327.62, stdev=6604.03
     lat (usec): min=19453, max=77684, avg=32615.39, stdev=6605.82
    clat percentiles (usec):
     |  1.00th=[21103],  5.00th=[23462], 10.00th=[25035], 20.00th=[26870],
     | 30.00th=[28443], 40.00th=[30016], 50.00th=[31327], 60.00th=[32637],
     | 70.00th=[34341], 80.00th=[36439], 90.00th=[41157], 95.00th=[45876],
     | 99.00th=[51119], 99.50th=[53216], 99.90th=[69731], 99.95th=[77071],
     | 99.99th=[77071]
   bw (  KiB/s): min=65536, max=147456, per=99.99%, avg=125542.40, stdev=9567.74, samples=120
   iops        : min=   16, max=   36, avg=30.65, stdev= 2.34, samples=120
  lat (msec)   : 20=0.22%, 50=98.32%, 100=1.47%
  cpu          : usr=0.90%, sys=1.53%, ctx=5513, majf=0, minf=13
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,1840,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=123MiB/s (129MB/s), 123MiB/s-123MiB/s (129MB/s-129MB/s), io=7360MiB (7718MB), run=60028-60028msec


root@virt-master4:~# fio --ioengine=io_uring --filename=/dev/disk/by-id/scsi-35000c500f8bfda83 --direct=1 --sync=1 --rw=write --bs=4K --numjobs=1 --iodepth=1 --runtime=60 --time_based --name=fio fio: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=1
fio-3.33
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][w=37.6MiB/s][w=9628 IOPS][eta 00m:00s]
fio: (groupid=0, jobs=1): err= 0: pid=1591836: Sat Mar 15 14:48:18 2025
  write: IOPS=9560, BW=37.3MiB/s (39.2MB/s)(2241MiB/60001msec); 0 zone resets
    slat (usec): min=3, max=242, avg= 5.75, stdev= 3.14
    clat (usec): min=3, max=37068, avg=97.67, stdev=81.74
     lat (usec): min=84, max=37086, avg=103.42, stdev=81.88
    clat percentiles (usec):
     |  1.00th=[   87],  5.00th=[   89], 10.00th=[   90], 20.00th=[   92],
     | 30.00th=[   93], 40.00th=[   94], 50.00th=[   95], 60.00th=[   97],
     | 70.00th=[   98], 80.00th=[  100], 90.00th=[  104], 95.00th=[  109],
     | 99.00th=[  125], 99.50th=[  159], 99.90th=[  416], 99.95th=[  676],
     | 99.99th=[  889]
   bw (  KiB/s): min=13600, max=39344, per=100.00%, avg=38312.13, stdev=2316.38, samples=119    iops        : min= 3400, max= 9836, avg=9578.03, stdev=579.09, samples=119
  lat (usec)   : 4=0.01%, 20=0.01%, 50=0.01%, 100=80.54%, 250=19.17%
  lat (usec)   : 500=0.21%, 750=0.04%, 1000=0.03%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
  cpu          : usr=3.54%, sys=8.67%, ctx=573655, majf=0, minf=52
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,573639,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=37.3MiB/s (39.2MB/s), 37.3MiB/s-37.3MiB/s (39.2MB/s-39.2MB/s), io=2241MiB (2350MB), run=60001-60001msec


--
Best regards / Mit freundlichen Grüßen
Daniel Vogelbacher
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux