Re: How much IOPS can be expected on NVME OSDs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Felix,

Those are pretty good drives and shouldn't have too much trouble with O_DSYNC writes which can often be a bottleneck for lower end NVMe drives.  Usually if the drives are fast enough, it comes down to clock speed and cores.  Clock speed helps the kv sync thread write metadata to the rocksdb WAL faster, and cores help the (default 16 for NVMe) tp_osd_tp threads do more work in parallel.

We have a number of Samsung PM983 drives in our community lab (which actually have less write endurance than your drives) and on a 60 OSD test cluster we can do somewhere around 37K IOPs per drive (~740K total) when factoring in 3x replication and loading the cluster with heavy 4K randomwrites using FIO.  Each OSD was using around 10 AMD Rome cores to pull that off though.  In isolation, a single OSD on those drives can do about 70-80K 4k randwrites, but once you introduce synchronous replication on a real cluster it's difficult to retain that level of performance.

My guess is that you probably will see lower than ~43K IOPs if you fully load your cluster with work.  Some things to check in on though, include making sure the CPUs stay in high power C/P states (transitioning to/from low power states introduces latency), NUMA topology if you are using a multi-CPU system, and as you stated, possibly looking at multiple OSDs per NVMe drive if you have a lot of CPU to spare (it's not worth doing if you are CPU limited though).


Hope this helps!

Mark


On 5/12/22 05:50, Stolte, Felix wrote:
Hi guys,

we recently got new hardware with NVME disks (Samsung MZPLL3T2HAJQ) and i am trying to figure out, how to get the most of them. Vendor states 180k for 4k random writes and my fio testing was 160K (fine by me).

I built an bluestore OSD on top of that (WAL, DB, Data all on the same disk) and executed an osd benchmark:

ceph tell osd.49 bench 1000000 4096
{
     "bytes_written": 1000000,
     "blocksize": 4096,
     "elapsed_sec": 0.0057061159999999998,
     "bytes_per_sec": 175250555.71951219,
     "iops": 42785.780204959032
}

Are this reasonable results and i just need to put more than one OSD on the nvme disks to get most out of it?

We are currentliy running Nautilus (but will upgrade to pacific in the near future). Will pacific give an performance boost in regards to IOPS?

Best regards
Felix


---------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Dr. Astrid Lambrecht, Prof. Dr. Frauke Melchior
---------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

Hi

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux