Just having reliable hardware isn’t enough for monitor failures. I’ve had a case where a wrongly typed command Brought down all three monitors via segfault and no way to bring them back since the command caused the monitor Database to be corrupt. I wish there was a checkpoint implemented in the monitor database so we can revert back
Changes. I’m not even sure a regular backup of the monitor database, say every 5 minute would have helped as it could Still cause out of sync issue between the OSD and Monitor. I’ve also tried the method of restoring the monitor database Via ceph-objectstore-tool but just end up with out of sync OSD and monitors where the monitor thinks the OSD is off line
But OSD is up, not to mention PGs were all out of whack as well. https://tracker.ceph.com/issues/22847 -- Efficiency is Intelligent Laziness From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of Caspar Smit <casparsmit@xxxxxxxxxxx> 2018-05-22 15:51 GMT+02:00 Wido den Hollander <wido@xxxxxxxx>:
And be sure to have enough space available on them to sustain a long period of PGS not being active+clean. Kind regards, Caspar
|
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com