Ceph RBD, MySQL write IOPs - what is possible?

Mark Lehrer <lehrer@xxxxxxxxx> · Fri, 7 Jun 2024 09:32:05 -0600

I've been using MySQL on Ceph forever, and have been down this road
before but it's been a couple of years so I wanted to see if there is
anything new here.

So the TL:DR version of this email - is there a good way to improve
16K write IOPs with a small number of threads?  The OSDs themselves
are idle so is this just a weakness in the algorithms or do ceph
clients need some profiling?  Or "other"?

Basically, this is one of the worst possible Ceph workloads so it is
fun to try to push the limits.  I also happen have a MySQL instance
that is reaching the write IOPs limit so this is also a last-ditch
effort to keep it on Ceph.

This cluster is as straightforward as it gets... 6 servers with 10
SSDs each, 100 Gb networking.  I'm using size=3.  During operations,
the OSDs are more or less idle so I don't suspect any hardware
limitations.

MySQL has no parallelism so the number of threads and effective queue
depth stay pretty low.  Therefore, as a proxy for MySQL I use rados
bench with 16K writes and 8 threads.  The RBD actually gets about 2x
this level - still not so great.

I get about 2000 IOPs with this test:

# rados bench -p volumes 10 write -t 8 -b 16K
hints = 1
Maintaining 8 concurrent writes of 16384 bytes to objects of size
16384 for up to 10 seconds or 0 objects
Object prefix: benchmark_data_fstosinfra-5_3652583
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1       8      2050      2042   31.9004   31.9062  0.00247633  0.00390848
    2       8      4306      4298   33.5728     35.25  0.00278488  0.00371784
    3       8      6607      6599   34.3645   35.9531  0.00277546  0.00363139
    4       7      8951      8944   34.9323   36.6406  0.00414908  0.00357249
    5       8     11292     11284    35.257   36.5625  0.00291434  0.00353997
    6       8     13588     13580   35.3588    35.875  0.00306094  0.00353084
    7       7     15933     15926   35.5432   36.6562  0.00308388   0.0035123
    8       8     18361     18353   35.8399   37.9219  0.00314996  0.00348327
    9       8     20629     20621   35.7947   35.4375  0.00352998   0.0034877
   10       5     23010     23005   35.9397     37.25  0.00395566  0.00347376
Total time run:         10.003
Total writes made:      23010
Write size:             16384
Object size:            16384
Bandwidth (MB/sec):     35.9423
Stddev Bandwidth:       1.63433
Max bandwidth (MB/sec): 37.9219
Min bandwidth (MB/sec): 31.9062
Average IOPS:           2300
Stddev IOPS:            104.597
Max IOPS:               2427
Min IOPS:               2042
Average Latency(s):     0.0034737
Stddev Latency(s):      0.00163661
Max latency(s):         0.115932
Min latency(s):         0.00179735
Cleaning up (deleting benchmark objects)
Removed 23010 objects
Clean up completed and total clean up time :7.44664

Are there any good options to improve this?  It seems like the client
side is the bottleneck since the OSD servers are at like 15%
utilization.

Thanks,
Mark
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx