Re: Wide variation in osd_mclock_max_capacity_iops_hdd

Frank Schilder <frans@xxxxxx> · Thu, 8 Sep 2022 12:24:02 +0000

> > My experience with osd bench is not good either
>
> it seems it was recently "fixed" by writing "a"'s instead of zeroes:

Thanks for pointing that out. Writing binary zeros is really bad as a lot of controllers interpret this as a trim or similar cheap operation and nothing will happen on disk physically (writing zeros is optimised, the sector is just marked as wiped instead of zeroed-out).

Writing "a"'s might be better, but again, a constant sequence of characters will not give realistic base line results due to other likely optimisation paths in the controllers. Experiments I made indicate that only random data, for example, collected from /dev/urandom into memory a-priory will properly benchmark the write path all the way to the disk as it will be incompressible data not triggering any short-cuts in the system. Another point is the amount of data to write to get realistic estimates.

A 30s burst of benchmark IO will usually not do. I found 5 minutes to be a bare minimum to get stable results. And even these change as the disk ages and/or fills up. However, 5 minutes is waaay too long for OSD startup. I think this might be a point where estimates of an OSD's performance cannot be deduced from a quick and dirty benchmark, but should rather come from actual IO stats, for example, commit latencies depending on IO size etc. The way ioping does it, but using the data from actual user IO for the disk ping. This would be really good, no overhead and also adjust at run time to disk ageing and usage effects.

For a number that has such a strong influence on proper functioning of operations scheduling I would not go with any test method that cannot be validated with another. If the osd bench test and fio give largely different results, the assumption that osd bench is not doing it right is the save one.

Fio and ioping are really good tools that provide comparable numbers by completely different methods. I would probably adopt one of these in favour of trying to get a third one right.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Sven Kieske <S.Kieske@xxxxxxxxxxx>
Sent: 08 September 2022 14:04:05
To: ceph-users@xxxxxxx; Frank Schilder
Cc: aad@xxxxxxxxxxxxxx; ormandj@xxxxxxxxxxxx; sseshasa@xxxxxxxxxx
Subject: Re:  Re: Wide variation in osd_mclock_max_capacity_iops_hdd

On Do, 2022-09-08 at 08:22 +0000, Frank Schilder wrote:
> My experience with osd bench is not good either

it seems it was recently "fixed" by writing "a"'s instead of zeroes:

https://github.com/ceph/ceph/commit/db045e005fab218f2bb270b7cb60b62abbbe3619

tongue in cheek:
not sure that this is a good benchmark though, even after the change.

writing good benchmark IO patterns is hard and are highly workload dependent.

so I guess the usual answer still applies:

write your own fio based benchmark for your usecase.

it would be cool to compile a sort of "standard" ceph benchmark fio testsuite
on github or some other public git host, if someone is interested in this kind
of stuff?

--
Mit freundlichen Grüßen / Regards

Sven Kieske
Systementwickler / systems engineer

Mittwald CM Service GmbH & Co. KG
Königsberger Straße 4-6
32339 Espelkamp

Tel.: 05772 / 293-900
Fax: 05772 / 293-333

https://www.mittwald.de

Geschäftsführer: Robert Meyer, Florian Jürgens

St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen
Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen

Informationen zur Datenverarbeitung im Rahmen unserer Geschäftstätigkeit
gemäß Art. 13-14 DSGVO sind unter www.mittwald.de/ds abrufbar.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx