ceph osd crush swap-bucket causes monitor catestrophic failure

"Beaman, Joshua (Contractor)" <Joshua_Beaman@xxxxxxxxxxx> · Tue, 8 Mar 2022 09:31:08 +0000

Today I made a mistake running a playbook and accidentally executed

ceph osd crush swap-bucket {old_host} {new_host}

where {old_host}={new_host}

After that command, the first two monitors immediately stopped responding and crashed.  The 3^rd monitor service was still running, but of course not able to do anything w/o quorum.  So I proceeded
 to edit its monmap down to a single node configuration.  Upon restarting its service, it came up for <1minute and then also crashed the same as the others.

These errors were found in logs:
Mar  8 03:10:44 pistoremon-as-d01-tier1 ceph-mon[3654621]: /build/ceph-14.2.22/src/crush/CrushWrapper.cc: In function 'int CrushWrapper::swap_bucket(CephContext*, int, int)' thread 7f878de42700 time 2022-03-08
 03:10:44.945920
Mar  8 03:10:44 pistoremon-as-d01-tier1 ceph-mon[3654621]: /build/ceph-14.2.22/src/crush/CrushWrapper.cc: 1279: FAILED ceph_assert(b->size == bs)

I have since attempted to follow these steps: 

https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds

in order to rebuild the kv_store and get a monitor working again. 

This has been so far unsuccessful, and I don’t seem to be getting very far.  I am able to sometimes get a single ceph-mon service to start IF, there are 2 other monitors in the monmap, so that a quorum is
 never formed.

When a ceph-mon service won’t start I usually cannot find any log errors beyond:
Mar  8 09:02:53 pistoremon-as-d02-tier1 systemd[1]: ceph-mon@pistoremon-as-d02-tier1.service: Start request repeated too quickly.
Mar  8 09:02:53 pistoremon-as-d02-tier1 systemd[1]: ceph-mon@pistoremon-as-d02-tier1.service: Failed with result 'start-limit-hit'.

Here’s some important details.
The cluster is all Nautilus 14.2.22
Most of the cluster OSDs are still filestore.  The point of the bucketswap was for bluestore migrations, of which about 3 have been completed.
The monitor hosts were still all LevelDB.  However, the exported process above appears to have generated RocksDB output, and I had to manually change the kv_backend for that dump to do any good

We are in the middle of a major outage, so any expeditious assistance is unfathomably appreciated.

Thanks,
Josh

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx