Hi all, one of our MONs was down for maintenance for ca. 45 minutes. After this time I started it up again and it joined the cluster. Unfortunately, things did not go as expected. The MON sub-cluster became unresponsive for a bit more than 10 minutes. Admin commands would hang, even if issued directly to a specific monitor via "ceph tell mon.xxx". In addition, our MDS lost connection to the MONs and reported a laggy connection. Consequently, all ceph fs access was frozen for a bit more than 10 minutes as well. >From the little I could get out with "ceph daemon mon.xxx mon_status" I could see that the restarted MON was in state "synchronizing" (or similar, its from memory) while the other mons were in quorum. Our cluster is mimic-12.2.8. Somehow, this observation does not fit together with the intended HA of the MON cluster, there should not be any stall at all. My questions: Why do the MONs become unresponsive for such a long time? What are the MONs doing during this time frame? Are there any config options I should look at? Are there any log messages I should hunt for? Any hint is appreciated. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx