Re: [SPAM] Re: Ceph RBD, MySQL write IOPs - what is possible?

Sebastian <sebcio.t@xxxxxxxxx> · Tue, 11 Jun 2024 14:19:03 +0200

Hi,
don’t expect solution on group, just direction.
Here is link to the blog post https://ceph.io/en/news/blog/2024/ceph-a-journey-to-1tibps/
on youtube is presentation from nyc ceph days

View performance from the client's perspective, run the measurement tools from inside the virtual machine. 
This approach will provide you the performance as experienced by the client.
The most commonly used tool for performance measurement is fio. I strongly recommend using fio for your evaluation.
also use ioping to measure latency. While fio will provide IOPS/ and latency metrics during load, ioping offers view of latency behavior when the machine is not under heavy load.
Based on my previous experiences (not only mine, but also my team), many performance issues were related to network configurations or problems around the network infrastructure. As example we encountered a situation where a change made by the network team to the spine switches caused disk latency to increase from 3ms to 80-120ms. 
Other example which almost burn me was issue with one spine cards, which was not fully broken, monitoring not discovered it, tests shows everything is ok but on ceph we had many, many issues like flapping osd’s, like half of osds form 500 goes down, latency spikes time to time. Card had problems time to time but not during tests :) 
and of course AMD nodes before I discovered iommu=pt for kernel params.
Belive me this c-states and power management on nodes are important. 

You already received very good advices from others, not much to add, look on your network drivers, rx queue, tx queue.

for your information this cluster was not fine tuned, also e2e enc. is enabled
6 node cluster, all nvme 8x nvme per node, 512gb ram, 4x25GB lacp for public and another 4x25GB for cluster net. (malleanox cards) 
# rados bench -p test 10 write -t 8 -b 16K
Rados bench results:
Total time run:         10.0003
Total writes made:      113195
Write size:             16384
Object size:            16384
Bandwidth (MB/sec):     176.862
Stddev Bandwidth:       27.047
Max bandwidth (MB/sec): 195.828
Min bandwidth (MB/sec): 107.906
Average IOPS:           11319
Stddev IOPS:            1731.01
Max IOPS:               12533
Min IOPS:               6906
Average Latency(s):     0.000705734
Stddev Latency(s):      0.00224331
Max latency(s):         0.325178
Min latency(s):         0.000413413

This is test from fio with librbd it shows more or less vm performance. 

[test]
ioengine=rbd
clientname=admin
pool=test
rbdname=bench
rw=randwrite
bs=4k
iodepth=256
direct=1
numjobs=1
fsync=0
size=10G
runtime=300
time_based
invalidate=0

test: (groupid=0, jobs=1): err= 0: pid=3495143: Tue Jun 11 11:56:04 2024
  write: IOPS=83.6k, BW=326MiB/s (342MB/s)(95.6GiB/300002msec); 0 zone resets
    slat (nsec): min=975, max=2665.0k, avg=3943.68, stdev=2820.21
    clat (usec): min=399, max=225434, avg=3058.67, stdev=1801.25

and for iodepth=1

test: (groupid=0, jobs=1): err= 0: pid=3503647: Tue Jun 11 11:57:48 2024
  write: IOPS=1845, BW=7382KiB/s (7559kB/s)(159MiB/22033msec); 0 zone resets
    slat (nsec): min=2966, max=41133, avg=4381.81, stdev=1062.40
    clat (usec): min=367, max=202364, avg=537.05, stdev=1009.49

and iodepth=256 and bs=16k

test: (groupid=0, jobs=1): err= 0: pid=3505339: Tue Jun 11 12:03:27 2024
  write: IOPS=79.6k, BW=1244MiB/s (1305MB/s)(365GiB/300002msec); 0 zone resets
    slat (nsec): min=1815, max=4497.4k, avg=5671.20, stdev=3540.33
    clat (usec): min=446, max=267567, avg=3208.34, stdev=2038.58
     lat (usec): min=451, max=267571, avg=3214.01, stdev=2038.60

BR,
Sebastian

> On 11 Jun 2024, at 02:23, Mark Lehrer <lehrer@xxxxxxxxx> wrote:
> 
> If they can do 1 TB/s with a single 16K write thread, that will be
> quite impressive :D    Otherwise not really applicable.  Ceph scaling
> has always been good.
> 
> More seriously, would you mind sending a link to this?
> 
> 
> Thanks!
> 
> Mark
> 
> On Mon, Jun 10, 2024 at 12:01 PM Anthony D'Atri <anthony.datri@xxxxxxxxx> wrote:
>> 
>> Eh?  cf. Mark and Dan's 1TB/s presentation.
>> 
>> On Jun 10, 2024, at 13:58, Mark Lehrer <lehrer@xxxxxxxxx> wrote:
>> 
>> It
>> seems like Ceph still hasn't adjusted to SSD performance.
>> 
>> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx