> >> The OP implies that the cluster's performance *degraded* with the Quincy upgrade.I wonder if there was a kernel change at the same time. > > No, it's never been great. But it's definitely getting worse over > time. That is most likely correlated with increased utilization (both > in terms of space used and IOPs demanded) rather than any specific > upgrades. Ahhhhh, from the original phrasing I thought that Quincy correlated with a sharp drop. > They have the latest firmware As per recent isdct/intelmas/sst? The web site? > I'm interested in stopping it from completely stalling out for 10-30 > seconds at a time on a regular basis. The actual number of IOPs of a > properly-functioning cluster is a *very* secondary concern. Just for grins, I might suggest -- once you have a fully healthy cluster -- destroying, secure-erasing, and redeploying a few OSDs at a time within a single failure domain. How old are the OSDs? I suspect at least some might be Filestore and thus would be redeployed with BlueStore. Newer SSD controllers / models are better than older models at housekeeping over time, so the secure-erase might freshen performance. > Networking is also operating at a tiny fraction of its capacity. It > looks like each server runs about 15-30 *megabits* per OSD. None of > them push even 1Gbps, much less 10. Fair enough. Layer 1 issues can still matter though. > > We have tested connectivity between all twelve ceph machines with all > 11 others machines with iperf3. Any two machines can iperf3 7-9Gbps > without any issues with Ceph running. Groovy. These are all things we have to ask ;) > >> Check the interfaces to ensure they have the proper netmasks and default routes; I found some systems with the main interface configured as a /32 it's top of mind lately. > > They do. > > Thanks! _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx