> server RAM and CPU > * osd_memory_target > * OSD drive model Thanks for the reply. The servers have dual Xeon Gold 6154 CPUs with 384 GB. The drives are older, first gen NVMe - WDC SN620. osd_memory_target is at the default. Mellanox CX5 and SN2700 hardware. The test client is a similar machine with no drives. The CPUs are 80% idle during the test. The OSDs (according to iostat) hover around 50% util during the test and are close to 0 at other times. I did find it interesting that the wareq-sz option in iostat is around 5 during the test - I was expecting 16. Is there a way to tweak this in bluestore? These drives are terrible at under 8K I/O. Not that it really matters since we're not I/O bound at all. I can also increase threads from 8 to 32 and the iops are roughly quadruple so that's good at least. Single thread writes are about 250 iops and like 3.7MB/sec. So sad. The rados bench process is also under 50% CPU utilization of a single core. This seems like a thead/semaphore kind of issue if I had to guess. It's tricky to debug when there is no obvious bottleneck. Thanks, Mark On Fri, Jun 7, 2024 at 9:47 AM Anthony D'Atri <anthony.datri@xxxxxxxxx> wrote: > > Please describe: > > * server RAM and CPU > * osd_memory_target > * OSD drive model > > > On Jun 7, 2024, at 11:32, Mark Lehrer <lehrer@xxxxxxxxx> wrote: > > > > I've been using MySQL on Ceph forever, and have been down this road > > before but it's been a couple of years so I wanted to see if there is > > anything new here. > > > > So the TL:DR version of this email - is there a good way to improve > > 16K write IOPs with a small number of threads? The OSDs themselves > > are idle so is this just a weakness in the algorithms or do ceph > > clients need some profiling? Or "other"? > > > > Basically, this is one of the worst possible Ceph workloads so it is > > fun to try to push the limits. I also happen have a MySQL instance > > that is reaching the write IOPs limit so this is also a last-ditch > > effort to keep it on Ceph. > > > > This cluster is as straightforward as it gets... 6 servers with 10 > > SSDs each, 100 Gb networking. I'm using size=3. During operations, > > the OSDs are more or less idle so I don't suspect any hardware > > limitations. > > > > MySQL has no parallelism so the number of threads and effective queue > > depth stay pretty low. Therefore, as a proxy for MySQL I use rados > > bench with 16K writes and 8 threads. The RBD actually gets about 2x > > this level - still not so great. > > > > I get about 2000 IOPs with this test: > > > > # rados bench -p volumes 10 write -t 8 -b 16K > > hints = 1 > > Maintaining 8 concurrent writes of 16384 bytes to objects of size > > 16384 for up to 10 seconds or 0 objects > > Object prefix: benchmark_data_fstosinfra-5_3652583 > > sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) > > 0 0 0 0 0 0 - 0 > > 1 8 2050 2042 31.9004 31.9062 0.00247633 0.00390848 > > 2 8 4306 4298 33.5728 35.25 0.00278488 0.00371784 > > 3 8 6607 6599 34.3645 35.9531 0.00277546 0.00363139 > > 4 7 8951 8944 34.9323 36.6406 0.00414908 0.00357249 > > 5 8 11292 11284 35.257 36.5625 0.00291434 0.00353997 > > 6 8 13588 13580 35.3588 35.875 0.00306094 0.00353084 > > 7 7 15933 15926 35.5432 36.6562 0.00308388 0.0035123 > > 8 8 18361 18353 35.8399 37.9219 0.00314996 0.00348327 > > 9 8 20629 20621 35.7947 35.4375 0.00352998 0.0034877 > > 10 5 23010 23005 35.9397 37.25 0.00395566 0.00347376 > > Total time run: 10.003 > > Total writes made: 23010 > > Write size: 16384 > > Object size: 16384 > > Bandwidth (MB/sec): 35.9423 > > Stddev Bandwidth: 1.63433 > > Max bandwidth (MB/sec): 37.9219 > > Min bandwidth (MB/sec): 31.9062 > > Average IOPS: 2300 > > Stddev IOPS: 104.597 > > Max IOPS: 2427 > > Min IOPS: 2042 > > Average Latency(s): 0.0034737 > > Stddev Latency(s): 0.00163661 > > Max latency(s): 0.115932 > > Min latency(s): 0.00179735 > > Cleaning up (deleting benchmark objects) > > Removed 23010 objects > > Clean up completed and total clean up time :7.44664 > > > > > > Are there any good options to improve this? It seems like the client > > side is the bottleneck since the OSD servers are at like 15% > > utilization. > > > > Thanks, > > Mark > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx