dear ceph users and developers, on one of our production clusters, we got into pretty unpleasant situation. After rebooting one of the nodes, when trying to start monitor, whole cluster seems to hang, including IO, ceph -s etc. When this mon is stopped again, everything seems to continue. Traying to spawn new monitor leads to the same problem (even on different node). I had to give up after minutes of outage, since it's unacceptable. I think we had this problem once in the past on this cluster, but after some (but much shorter) time, monitor joined and it worked fine since then. All cluster nodes are centos 7 machines, I have 3 monitors (so 2 are now running), I'm using ceph 13.2.6 Network connection seems to be fine. Anyone seen similar problem? I'd be very grateful for tips on how to debug and solve this.. for those interested, here's log of one of running monitors with debug_mon set to 10/10: https://storage.lbox.cz/public/d258d0 if I could provide more info, please let me know with best regards nikola ciprich -- ------------------------------------- Ing. Nikola CIPRICH LinuxBox.cz, s.r.o. 28. rijna 168, 709 00 Ostrava tel.: +420 591 166 214 fax: +420 596 621 273 mobil: +420 777 093 799 www.linuxbox.cz mobil servis: +420 737 238 656 email servis: servis@xxxxxxxxxxx ------------------------------------- _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com