Hi Luis, Thank you for sharing your tricks :) OK it's clever.You by-pass a destroying disk fio bench with a test on a unique PG on a single OSD, and then do some rados bench.
In this way you should get a more realistic ceph IOPS ! Rafael Le 30/06/2023 à 15:00, Luis Domingues a écrit :
Hi Rafael. We faced the exact same issue. And we did a bunch of tests and question. We started with some FIOs, but results where quite meh once in production. Ceph bench did not seemed very reliable. What we ended up doing and seems to hold up quite nicely, is the above. It's probably not the best but works for use. Step 1: We put the disks we want to test in a small dev/test cluster we have. So we can mess up with the configs, and go to the not so prod friendly way to configure the cluster. And we can make sure no other activity other than our test is running. Step 2: Create a few pools, that have this particularities: - Only 1 PG - No replication at all. The only PG of the pool lives in only 1 disk. - Upmap the PG to the OSD you want to test. - Repeat for a few disks. Step 2 needs to enable some options of ceph, as it will not allow you to do it by default. I do not remember the exact options, but you can find them on the documentation. Sept 3: set mclock as high_client_ops. So mclock will virtually not limit client ops. Step 4: Run a few ceph rados bench on the different disks. rados bench <seconds> write -t <number of threads> -b <block size> -p <your pool> <seconds> we used 300 secondes to be able do perfom various tests without taking a week, but letting the cluster writing a bunch of stuff anyway. <threads> we used 100, it worked well, and previous tests for other purposes showed 100 was nice on our installation. <pool> the pool with the disk you want to test <block size> we used 128k and 4M. Feel free to experiment with other values. But in our use case it was what we used. Somewhere in the output of the bench, after it finishes, will be Average IOPS. This average is more or less what the disk is capable of handling. Then we put some number close to that one. If we have two types of disks that are close to each others, we put the smaller value for all disks. And we set it as a global configuration on ceph instead of going disk by disk. It's probably not perfect and looks very like we tinkered something, but it's the best solution for testing that so far. And most important, results between tests where a lot more consistent than ceph bench or fio. Hope this will help you. Luis Domingues Proton AG ------- Original Message ------- On Friday, June 30th, 2023 at 12:15, Rafael Diaz Maurin <Rafael.DiazMaurin@xxxxxxxxxxxxxx> wrote:Hello, I've just upgraded a Pacific cluster into Quincy, and all my osd have the low value osd_mclock_max_capacity_iops_hdd : 315.000000. The manuel does not explain how to benchmark the OSD with fio or ceph bench with good options. Can someone have the good ceph bench options or fio options in order to configure osd_mclock_max_capacity_iops_hdd for each osd ? I ran this bench various times on the same OSD (class hdd) and I obtain different results. ceph tell ${osd} cache drop ceph tell ${osd} bench 12288000 4096 4194304 100 example : osd.21 (hdd): osd_mclock_max_capacity_iops_hdd = 315.000000 bench 1 : 3006.2271379745534 bench 2 : 819.503206458996 bench 3 : 946.5406320134085 How can I succeed in getting the good values for the osd_mclock_max_capacity_iops_[hdd|ssd] options ? Thank you for your help, Rafael
Attachment:
smime.p7s
Description: Signature cryptographique S/MIME
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx