On Sun, Aug 13, 2023 at 10:43 PM Anthony D'Atri <anthony.datri@xxxxxxxxx> wrote: > Think you meant s/SSD/SAS|SATA/ Yes, these are SATA SSDs. They are indeed pretty old. They have the latest firmware. They are on LSI 3008 IT SAS controllers. > If the OP means physical core, granted that the CPUs are probably as old as the SSDs, but probably still have HT enabled, so 1c would be 2 threads, which for such small drives isn't so awful. Yes, physical cores. > The OP implies that the cluster's performance *degraded* with the Quincy upgrade.I wonder if there was a kernel change at the same time. No, it's never been great. But it's definitely getting worse over time. That is most likely correlated with increased utilization (both in terms of space used and IOPs demanded) rather than any specific upgrades. (below are replies to Tyler's questions) > You might want to try smaller I/O sizes and > more clients to assess cluster performance if you are interested in > high IOPS workloads. I'm interested in stopping it from completely stalling out for 10-30 seconds at a time on a regular basis. The actual number of IOPs of a properly-functioning cluster is a *very* secondary concern. > Public network only, no replication network? The public network and replication network are VLANs on the bond. > Core, or hyperthread? What CPU SKU? Core. These are all Xeons contemporaneous with the SSDs, e.g., E5-2620 v3. The CPU usage averages 95% idle. > So no VMs on them, right? Right. > Has that hardware been in service that long? Yes. > May sound like a copout, but I'd look hard at the networking. Networking is also operating at a tiny fraction of its capacity. It looks like each server runs about 15-30 *megabits* per OSD. None of them push even 1Gbps, much less 10. > Look for dropped Less than one in 10 million, and even those are likely a result of our iperf tests. > / retransmitted packets, None. > framing / CRC errors on the switch side None. >. Maybe use fping or gping, iperf3. We have tested connectivity between all twelve ceph machines with all 11 others machines with iperf3. Any two machines can iperf3 7-9Gbps without any issues with Ceph running. > Check the interfaces to ensure they have the proper netmasks and default routes; They do. Thanks! _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx