Today I made a mistake running a playbook and accidentally executed
ceph osd crush swap-bucket {old_host} {new_host} where {old_host}={new_host} After that command, the first two monitors immediately stopped responding and crashed. The 3rd monitor service was still running, but of course not able to do anything w/o quorum. So I proceeded
to edit its monmap down to a single node configuration. Upon restarting its service, it came up for <1minute and then also crashed the same as the others. These errors were found in logs: Mar 8 03:10:44 pistoremon-as-d01-tier1 ceph-mon[3654621]: /build/ceph-14.2.22/src/crush/CrushWrapper.cc: In function 'int CrushWrapper::swap_bucket(CephContext*, int, int)' thread 7f878de42700 time 2022-03-08
03:10:44.945920 Mar 8 03:10:44 pistoremon-as-d01-tier1 ceph-mon[3654621]: /build/ceph-14.2.22/src/crush/CrushWrapper.cc: 1279: FAILED ceph_assert(b->size == bs) I have since attempted to follow these steps:
https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds in order to rebuild the kv_store and get a monitor working again.
This has been so far unsuccessful, and I don’t seem to be getting very far. I am able to sometimes get a single ceph-mon service to start IF, there are 2 other monitors in the monmap, so that a quorum is
never formed. When a ceph-mon service won’t start I usually cannot find any log errors beyond: Mar 8 09:02:53 pistoremon-as-d02-tier1 systemd[1]: ceph-mon@pistoremon-as-d02-tier1.service: Start request repeated too quickly. Mar 8 09:02:53 pistoremon-as-d02-tier1 systemd[1]: ceph-mon@pistoremon-as-d02-tier1.service: Failed with result 'start-limit-hit'. Here’s some important details. The cluster is all Nautilus 14.2.22 Most of the cluster OSDs are still filestore. The point of the bucketswap was for bluestore migrations, of which about 3 have been completed. The monitor hosts were still all LevelDB. However, the exported process above appears to have generated RocksDB output, and I had to manually change the kv_backend for that dump to do any good We are in the middle of a major outage, so any expeditious assistance is unfathomably appreciated. Thanks, Josh |
_______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx