Hello Ceph users,

we've had a incident with CephFS recently which resulted in the MDSs crashing for one of the filesystems. First it was a journal corruption which was easily recoverable with no real damage but following that the MDSs fail to start due to a crash related to snapshots.

   -10> 2024-02-18T20:03:51.656+0000 7f08725bfb38  1 mds.0.234010 handle_mds_map state change up:rejoin --> up:active
    -9> 2024-02-18T20:03:51.656+0000 7f08725bfb38  1 mds.0.234010 recovery_done -- successful recovery!
    -8> 2024-02-18T20:03:51.656+0000 7f087271fb38 10 monclient: get_auth_request con 0x7f0871836380 auth_method 0
    -7> 2024-02-18T20:03:51.656+0000 7f08726d5b38 10 monclient: get_auth_request con 0x7f08718353c0 auth_method 0
    -6> 2024-02-18T20:03:51.656+0000 7f08726fab38 10 monclient: get_auth_request con 0x7f0871895280 auth_method 0
    -5> 2024-02-18T20:03:51.656+0000 7f08725bfb38  1 mds.0.234010 active_start
    -4> 2024-02-18T20:03:51.656+0000 7f08725bfb38  1 mds.0.cache dump_cache to cachedump.234013.mds0
    -3> 2024-02-18T20:03:51.736+0000 7f08725bfb38  1 mds.0.234010 cluster recovered.
    -2> 2024-02-18T20:03:51.736+0000 7f08725bfb38  4 mds.0.234010 set_osd_epoch_barrier: epoch=75012
    -1> 2024-02-18T20:03:51.736+0000 7f08722f9b38 -1 /home/buildozer/aports/community/ceph18/src/ceph-18.2.1/src/mds/ In function 'void MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*, CDentry*, snapid_t, CInode**, CDentry::linkage_t*)' thread 7f08722f9b38 time 2024-02-18T20:03:51.747600+0000
/home/buildozer/aports/community/ceph18/src/ceph-18.2.1/src/mds/ 1638: FAILED ceph_assert(follows >= realm->get_newest_seq())

 ceph version 18.2.1 (e3fce6809130d78ac0058fc87e537ecd926cd213) reef (stable)

     0> 2024-02-18T20:03:51.736+0000 7f08722f9b38 -1 *** Caught signal (Aborted) **
 in thread 7f08722f9b38 thread_name:MR_Finisher

 ceph version 18.2.1 (e3fce6809130d78ac0058fc87e537ecd926cd213) reef (stable)

We've sadly attempted to restore the root, which made the entire filesystem tree stray.

1. Is there a way to re-link a directory as the root (or dentry of root) manually?
2. Could snapshots of objects be purged completly manually?

Thanks in advance.

Alex D.
RedXen System & Infrastructure Administration

