Hello Ceph users, we've had a incident with CephFS recently which resulted in the MDSs crashing for one of the filesystems. First it was a journal corruption which was easily recoverable with no real damage but following that the MDSs fail to start due to a crash related to snapshots. -10> 2024-02-18T20:03:51.656+0000 7f08725bfb38 1 mds.0.234010 handle_mds_map state change up:rejoin --> up:active -9> 2024-02-18T20:03:51.656+0000 7f08725bfb38 1 mds.0.234010 recovery_done -- successful recovery! -8> 2024-02-18T20:03:51.656+0000 7f087271fb38 10 monclient: get_auth_request con 0x7f0871836380 auth_method 0 -7> 2024-02-18T20:03:51.656+0000 7f08726d5b38 10 monclient: get_auth_request con 0x7f08718353c0 auth_method 0 -6> 2024-02-18T20:03:51.656+0000 7f08726fab38 10 monclient: get_auth_request con 0x7f0871895280 auth_method 0 -5> 2024-02-18T20:03:51.656+0000 7f08725bfb38 1 mds.0.234010 active_start -4> 2024-02-18T20:03:51.656+0000 7f08725bfb38 1 mds.0.cache dump_cache to cachedump.234013.mds0 -3> 2024-02-18T20:03:51.736+0000 7f08725bfb38 1 mds.0.234010 cluster recovered. -2> 2024-02-18T20:03:51.736+0000 7f08725bfb38 4 mds.0.234010 set_osd_epoch_barrier: epoch=75012 -1> 2024-02-18T20:03:51.736+0000 7f08722f9b38 -1 /home/buildozer/aports/community/ceph18/src/ceph-18.2.1/src/mds/MDCache.cc: In function 'void MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*, CDentry*, snapid_t, CInode**, CDentry::linkage_t*)' thread 7f08722f9b38 time 2024-02-18T20:03:51.747600+0000 /home/buildozer/aports/community/ceph18/src/ceph-18.2.1/src/mds/MDCache.cc: 1638: FAILED ceph_assert(follows >= realm->get_newest_seq()) ceph version 18.2.1 (e3fce6809130d78ac0058fc87e537ecd926cd213) reef (stable) 0> 2024-02-18T20:03:51.736+0000 7f08722f9b38 -1 *** Caught signal (Aborted) ** in thread 7f08722f9b38 thread_name:MR_Finisher ceph version 18.2.1 (e3fce6809130d78ac0058fc87e537ecd926cd213) reef (stable) We've sadly attempted to restore the root, which made the entire filesystem tree stray. 1. Is there a way to re-link a directory as the root (or dentry of root) manually? 2. Could snapshots of objects be purged completly manually? Thanks in advance. -- Alex D. RedXen System & Infrastructure Administration https://redxen.eu/
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx