Good insights from Alex. Are these clusters all new? Or have they been around a while, previously happier? One idea that comes to mind is an MTU mismatch between hosts and switches, or some manner of bonding misalignment. What does `netstat -I` show? `ethtool -S`? I’m thinking that maybe just maybe bonding (if present) is awry in some fashion such that half of packets in/out disappear into the twilight zone. Like if LACP appears up on the host but a switch issue dooms all packets on one link, in or out. > On Nov 25, 2024, at 9:45 PM, Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> wrote: > > Hi Martin, > > This is a bit of generic recommendation, but I would go down the path of > reducing complexity, i.e. first test the drive locally on the OSD node and > see if there's anything going on with e.g. drive firmware, cables, HBA, > power. > > Then do fio from another host, and this would incorporate networking. > > If those look fine, I would do something crazy with Ceph, such as a huge > number of PGs, or failure domain of OSD, and just deploy a handful of OSDs > to see if you can bring the problem out in the open. I would use a default > setup, with no tweaks to scheduler etc. Hopefully, you'll get some error > messages in the logs - ceph logs, syslog, dmesg. Maybe at that point it > will become more obvious, or at least some messages will come through that > will make sense (to you or someone else on the list). > > In other words, it seems you have to break this a bit more to get proper > diagnostics. I know you guys have played with Ceph before, and can do the > math of what the IOPS values should be - three clusters all seeing the same > problem would most likely indicate a non-default configuration value that > is not correct. > -- > Alex Gorbachev > ISS > > > >> On Mon, Nov 25, 2024 at 9:34 PM Martin Gerhard Loschwitz < >> martin.loschwitz@xxxxxxxxxxxxx> wrote: >> >> Folks, >> >> I am getting somewhat desperate debugging multiple setups here within the >> same environment. Three clusters, two SSD-only, one HDD-only, and what they >> all have in common is abysmal 4k IOPS performance when measuring with >> „rados bench“. Abysmal means: In an All-SSD cluster I will get roughly 400 >> IOPS over more than 250 devices. I’ve know SAS-SSDs are not ideal, but 250 >> looks a bit on the low side of things to me. >> >> In the second cluster, also All-SSD based, I get roughly 120 4k IOPS. And >> the HDD-only cluster delivers 60 4k IOPS. The latter both with >> substantially fewer devices, granted. But even with 20 HDDs, 68 4k IOPS >> seems like a very bad value to me. >> >> I’ve tried to rule out everything I know of: BIOS misconfigurations, HBA >> problems, networking trouble (I am seeing comparably bad values with a >> size=1 pool) and so further and so on. But to no avail. Has anybody dealt >> with something similar on Dell hardware or in general? What could cause >> such extremely bad benchmark results? >> >> I measure with rados bench and qd=1 at 4k block size. „ceph tell osd >> bench“ with 4k blocks yields 30k+ IOPS for every single device in the big >> cluster, and all that leads to is 400 IOPS in total when writing to it? >> Even with no replication in place? That looks a bit off, doesn't it? Any >> help will be greatly appreciated, thank you very much in advance. Even a >> pointer to the right direction would be held in high esteem right now. >> Thank you very much in advance! >> >> Best regards >> Martin >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx