Hi Felix,
Those are pretty good drives and shouldn't have too much trouble with
O_DSYNC writes which can often be a bottleneck for lower end NVMe
drives. Usually if the drives are fast enough, it comes down to clock
speed and cores. Clock speed helps the kv sync thread write metadata to
the rocksdb WAL faster, and cores help the (default 16 for NVMe)
tp_osd_tp threads do more work in parallel.
We have a number of Samsung PM983 drives in our community lab (which
actually have less write endurance than your drives) and on a 60 OSD
test cluster we can do somewhere around 37K IOPs per drive (~740K total)
when factoring in 3x replication and loading the cluster with heavy 4K
randomwrites using FIO. Each OSD was using around 10 AMD Rome cores to
pull that off though. In isolation, a single OSD on those drives can do
about 70-80K 4k randwrites, but once you introduce synchronous
replication on a real cluster it's difficult to retain that level of
performance.
My guess is that you probably will see lower than ~43K IOPs if you fully
load your cluster with work. Some things to check in on though, include
making sure the CPUs stay in high power C/P states (transitioning
to/from low power states introduces latency), NUMA topology if you are
using a multi-CPU system, and as you stated, possibly looking at
multiple OSDs per NVMe drive if you have a lot of CPU to spare (it's not
worth doing if you are CPU limited though).
Hope this helps!
Mark
On 5/12/22 05:50, Stolte, Felix wrote:
Hi guys,
we recently got new hardware with NVME disks (Samsung MZPLL3T2HAJQ) and i am trying to figure out, how to get the most of them. Vendor states 180k for 4k random writes and my fio testing was 160K (fine by me).
I built an bluestore OSD on top of that (WAL, DB, Data all on the same disk) and executed an osd benchmark:
ceph tell osd.49 bench 1000000 4096
{
"bytes_written": 1000000,
"blocksize": 4096,
"elapsed_sec": 0.0057061159999999998,
"bytes_per_sec": 175250555.71951219,
"iops": 42785.780204959032
}
Are this reasonable results and i just need to put more than one OSD on the nvme disks to get most out of it?
We are currentliy running Nautilus (but will upgrade to pacific in the near future). Will pacific give an performance boost in regards to IOPS?
Best regards
Felix
---------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Dr. Astrid Lambrecht, Prof. Dr. Frauke Melchior
---------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
Hi
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx