MONs unresponsive for excessive amount of time

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

one of our MONs was down for maintenance for ca. 45 minutes. After this time I started it up again and it joined the cluster.

Unfortunately, things did not go as expected. The MON sub-cluster became unresponsive for a bit more than 10 minutes. Admin commands would hang, even if issued directly to a specific monitor via "ceph tell mon.xxx". In addition, our MDS lost connection to the MONs and reported a laggy connection. Consequently, all ceph fs access was frozen for a bit more than 10 minutes as well.

>From the little I could get out with "ceph daemon mon.xxx mon_status" I could see that the restarted MON was in state "synchronizing" (or similar, its from memory) while the other mons were in quorum.

Our cluster is mimic-12.2.8. Somehow, this observation does not fit together with the intended HA of the MON cluster, there should not be any stall at all.

My questions: Why do the MONs become unresponsive for such a long time? What are the MONs doing during this time frame? Are there any config options I should look at? Are there any log messages I should hunt for?

Any hint is appreciated.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux