Hello list, I’m having a serious issue, since my ceph cluster has become unresponsive. I was upgrading my cluster (3 servers, 3 monitors) from 13.2.1 to 13.2.2, which shouldn’t be a problem. Though on reboot my first host reported: starting mon.ceph01 rank -1 at 192.168.200.197:6789/0 mon_data /var/lib/ceph/mon/ceph-ceph01 fsid 27dd45f1-28b5-4ac6-81ab-c62bc581130c mon.cephxx@-1(probing) e5 preinit fsid 27dd45f1-28b5-4ac6-81ab-c62bc581130c mon.cephxx@-1(probing) e5 not in monmap and have been in a quorum before; must have been removed -1 mon.cephxx@-1(probing) e5 commit suicide! -1 failed to initialize I thought, perhaps the monitor doesn’t want to accept the monmap of the other 2, because of the version-difference. Sadly, I upgraded and rebooted the second server.
Since the cluster is unresponsive (because more than half of the monitors is offline / out of quorum). The logs of my second host, it keeps spamming: 2018-10-04 14:39:06.802 7fed0058f700 -1 mon.ceph02@1(probing) e6 get_health_metrics reporting 14 slow ops, oldest is auth(proto 0 27 bytes epoch 6) Any help VERY MUCH appreciated, this sucks. Thanks |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com