Re: Ceph RBD, MySQL write IOPs - what is possible?

Mark Lehrer <lehrer@xxxxxxxxx> · Fri, 7 Jun 2024 11:20:30 -0600

> server RAM and CPU
> * osd_memory_target
> * OSD drive model

Thanks for the reply.  The servers have dual Xeon Gold 6154 CPUs with
384 GB.  The drives are older, first gen NVMe - WDC SN620.
osd_memory_target is at the default.  Mellanox CX5 and SN2700
hardware.  The test client is a similar machine with no drives.

The CPUs are 80% idle during the test.  The OSDs (according to iostat)
hover around 50% util during the test and are close to 0 at other
times.

I did find it interesting that the wareq-sz option in iostat is around
5 during the test - I was expecting 16.  Is there a way to tweak this
in bluestore?  These drives are terrible at under 8K I/O.  Not that it
really matters since we're not I/O bound at all.

I can also increase threads from 8 to 32 and the iops are roughly
quadruple so that's good at least.  Single thread writes are about 250
iops and like 3.7MB/sec.  So sad.

The rados bench process is also under 50% CPU utilization of a single
core.  This seems like a thead/semaphore kind of issue if I had to
guess.  It's tricky to debug when there is no obvious bottleneck.

Thanks,
Mark

On Fri, Jun 7, 2024 at 9:47 AM Anthony D'Atri <anthony.datri@xxxxxxxxx> wrote:
>
> Please describe:
>
> * server RAM and CPU
> * osd_memory_target
> * OSD drive model
>
> > On Jun 7, 2024, at 11:32, Mark Lehrer <lehrer@xxxxxxxxx> wrote:
> >
> > I've been using MySQL on Ceph forever, and have been down this road
> > before but it's been a couple of years so I wanted to see if there is
> > anything new here.
> >
> > So the TL:DR version of this email - is there a good way to improve
> > 16K write IOPs with a small number of threads?  The OSDs themselves
> > are idle so is this just a weakness in the algorithms or do ceph
> > clients need some profiling?  Or "other"?
> >
> > Basically, this is one of the worst possible Ceph workloads so it is
> > fun to try to push the limits.  I also happen have a MySQL instance
> > that is reaching the write IOPs limit so this is also a last-ditch
> > effort to keep it on Ceph.
> >
> > This cluster is as straightforward as it gets... 6 servers with 10
> > SSDs each, 100 Gb networking.  I'm using size=3.  During operations,
> > the OSDs are more or less idle so I don't suspect any hardware
> > limitations.
> >
> > MySQL has no parallelism so the number of threads and effective queue
> > depth stay pretty low.  Therefore, as a proxy for MySQL I use rados
> > bench with 16K writes and 8 threads.  The RBD actually gets about 2x
> > this level - still not so great.
> >
> > I get about 2000 IOPs with this test:
> >
> > # rados bench -p volumes 10 write -t 8 -b 16K
> > hints = 1
> > Maintaining 8 concurrent writes of 16384 bytes to objects of size
> > 16384 for up to 10 seconds or 0 objects
> > Object prefix: benchmark_data_fstosinfra-5_3652583
> >  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
> >    0       0         0         0         0         0           -           0
> >    1       8      2050      2042   31.9004   31.9062  0.00247633  0.00390848
> >    2       8      4306      4298   33.5728     35.25  0.00278488  0.00371784
> >    3       8      6607      6599   34.3645   35.9531  0.00277546  0.00363139
> >    4       7      8951      8944   34.9323   36.6406  0.00414908  0.00357249
> >    5       8     11292     11284    35.257   36.5625  0.00291434  0.00353997
> >    6       8     13588     13580   35.3588    35.875  0.00306094  0.00353084
> >    7       7     15933     15926   35.5432   36.6562  0.00308388   0.0035123
> >    8       8     18361     18353   35.8399   37.9219  0.00314996  0.00348327
> >    9       8     20629     20621   35.7947   35.4375  0.00352998   0.0034877
> >   10       5     23010     23005   35.9397     37.25  0.00395566  0.00347376
> > Total time run:         10.003
> > Total writes made:      23010
> > Write size:             16384
> > Object size:            16384
> > Bandwidth (MB/sec):     35.9423
> > Stddev Bandwidth:       1.63433
> > Max bandwidth (MB/sec): 37.9219
> > Min bandwidth (MB/sec): 31.9062
> > Average IOPS:           2300
> > Stddev IOPS:            104.597
> > Max IOPS:               2427
> > Min IOPS:               2042
> > Average Latency(s):     0.0034737
> > Stddev Latency(s):      0.00163661
> > Max latency(s):         0.115932
> > Min latency(s):         0.00179735
> > Cleaning up (deleting benchmark objects)
> > Removed 23010 objects
> > Clean up completed and total clean up time :7.44664
> >
> >
> > Are there any good options to improve this?  It seems like the client
> > side is the bottleneck since the OSD servers are at like 15%
> > utilization.
> >
> > Thanks,
> > Mark
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx