Hey, I've been running a ceph cluster of arm64 SOCs on Luminous for the past year or so, with no major problems. I recently upgraded to 14.2.7, and the stability of the cluster immediately suffered. Seemed like any mon activity was subject to long pauses, and the cluster would hang frequently. Looking at ceph -s, it appeared the cluster was electing new masters very frequently - masters didn't seem to last longer than about 1-2 minutes. Looking further at the mons, two out of three of which are running on relatively slow-performing SD card storage on these SoCs, I saw them absolutely maxing out the root device's IO with writes. Logs show rocksdb constantly running compactions. I temporarily moved these mons to devices with better performing IO (but shouldn't be mons as they're also cephfs clients) and saw a sustained write rate of ~50MB/s. This seems pretty excessive, and is at least an order of magnitude higher than anything I saw when running Luminous. Not to mention this isn't really nice for SSD lifespan. As downgrading is not an option here, is there anything I can look to to figure out what exactly the mons are doing and how to prevent such heavy load? I seem to remember some bug related to telemetry, but can't find it on this list.. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx