Hi, First of all I would suggest upgrading your cluster on one of the supported releases. I think full recovery is recommended to get back the mds. 1. Stop the mdses and all the clients. 2. Fail the fs. a. ceph fs fail <fs name> 3. Backup the journal: (If the below command fails, make rados level copy using http://tracker.ceph.com/issues/9902). Since the mds is corrupted, we can skip this too ? # cephfs-journal-tool journal export backup.bin 4. Cleanup up ancillary data generated during if any previous recovery. # cephfs-data-scan cleanup [<data pool>] 5. Recover_dentries, reset session, and reset_journal: # cephfs-journal-tool --rank <fsname>:0 event recover_dentries list # cephfs-table-tool <fsname>:all reset session # cephfs-journal-tool --rank <fsname>:0 journal reset 6. Execute scan_extents on each of the x4 tools pods in parallel: # cephfs-data-scan scan_extents --worker_n 0 --worker_m 4 --filesystem <fsname> <data-pool> # cephfs-data-scan scan_extents --worker_n 1 --worker_m 4 --filesystem <fsname> <data-pool> # cephfs-data-scan scan_extents --worker_n 2 --worker_m 4 --filesystem <fsname> <data-pool> # cephfs-data-scan scan_extents --worker_n 3 --worker_m 4 --filesystem <fsname> <data-pool> 7. Execute scan_inodes on each of the x4 tools pods in parallel: # cephfs-data-scan scan_inodes --worker_n 0 --worker_m 4 --filesystem <fsname> <data-pool> # cephfs-data-scan scan_inodes --worker_n 1 --worker_m 4 --filesystem <fsname> <data-pool> # cephfs-data-scan scan_inodes --worker_n 2 --worker_m 4 --filesystem <fsname> <data-pool> # cephfs-data-scan scan_inodes --worker_n 3 --worker_m 4 --filesystem <fsname> <data-pool> 8. scan_links: # cephfs-data-scan scan_links --filesystem <fsname> 9. Mark the filesystem joinable from pod/rook-ceph-tools: # ceph fs set <fsname> joinable true 10. Startup MDSs 11. Scrub online fs # ceph tell mds.<fsname>-<active-mds[a|b]> scrub start / recursive repair 12. Check scrub status: # ceph tell mds.<fsname>-{pick-active-mds| a or b} scrub status For more information please look into https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/ Thanks, Kotresh H R On Wed, Apr 26, 2023 at 3:08 AM <jack@xxxxxxxxxxxxxxxxxxx> wrote: > Hi All, > > We have a CephFS cluster running Octopus with three control nodes each > running an MDS, Monitor, and Manager on Ubuntu 20.04. The OS drive on one > of these nodes failed recently and we had to do a fresh install, but made > the mistake of installing Ubuntu 22.04 where Octopus is not available. We > tried to force apt to use the Ubuntu 20.04 repo when installing Ceph so > that it would install Octopus, but for some reason Quincy was still > installed. We re-integrated this node and it seemed to work fine for about > a week until our cluster reported damage to an MDS daemon and placed our > filesystem into a degraded state. > > cluster: > id: 692905c0-f271-4cd8-9e43-1c32ef8abd13 > health: HEALTH_ERR > mons are allowing insecure global_id reclaim > 1 filesystem is degraded > 1 filesystem is offline > 1 mds daemon damaged > noout flag(s) set > 161 scrub errors > Possible data damage: 24 pgs inconsistent > 8 pgs not deep-scrubbed in time > 4 pgs not scrubbed in time > 6 daemons have recently crashed > > services: > mon: 3 daemons, quorum database-0,file-server,webhost (age 12d) > mgr: database-0(active, since 4w), standbys: webhost, file-server > mds: cephfs:0/1 3 up:standby, 1 damaged > osd: 91 osds: 90 up (since 32h), 90 in (since 5M) > flags noout > > task status: > > data: > pools: 7 pools, 633 pgs > objects: 169.18M objects, 640 TiB > usage: 883 TiB used, 251 TiB / 1.1 PiB avail > pgs: 605 active+clean > 23 active+clean+inconsistent > 4 active+clean+scrubbing+deep > 1 active+clean+scrubbing+deep+inconsistent > > We are not sure if the Quincy/Octopus version mismatch is the problem, but > we are in the process of downgrading this node now to ensure all nodes are > running Octopus. Before doing that, we ran the following commands to try > and recover: > > $ cephfs-journal-tool --rank=cephfs:all journal export backup.bin > > $ sudo cephfs-journal-tool --rank=cephfs:all event recover_dentries > summary: > > Events by type: > OPEN: 29589 > PURGED: 1 > SESSION: 16 > SESSIONS: 4 > SUBTREEMAP: 127 > UPDATE: 70438 > Errors: 0 > > $ cephfs-journal-tool --rank=cephfs:0 journal reset: > > old journal was 170234219175~232148677 > new journal start will be 170469097472 (2729620 bytes past old end) > writing journal head > writing EResetJournal entry > done > > $ cephfs-table-tool all reset session > > All of our MDS daemons are down and fail to restart with the following > errors: > > -3> 2023-04-20T10:25:15.072-0700 7f0465069700 -1 log_channel(cluster) log > [ERR] : journal replay alloc 0x1000053af79 not in free > [0x1000053af7d~0x1e8,0x1000053b35c~0x1f7,0x1000053b555~0x2,0x1000053b559~0x2,0x1000053b55d~0x2,0x1000053b561~0x2,0x1000053b565~0x1de,0x1000053b938~0x1fd,0x1000053bd2a~0x4,0x1000053bf23~0x4,0x1000053c11c~0x4,0x1000053cd7b~0x158,0x1000053ced8~0xffffac3128] > -2> 2023-04-20T10:25:15.072-0700 7f0465069700 -1 log_channel(cluster) > log [ERR] : journal replay alloc > [0x1000053af7a~0x1eb,0x1000053b35c~0x1f7,0x1000053b555~0x2,0x1000053b559~0x2,0x1000053b55d~0x2], > only > [0x1000053af7d~0x1e8,0x1000053b35c~0x1f7,0x1000053b555~0x2,0x1000053b559~0x2,0x1000053b55d~0x2] > is in free > [0x1000053af7d~0x1e8,0x1000053b35c~0x1f7,0x1000053b555~0x2,0x1000053b559~0x2,0x1000053b55d~0x2,0x1000053b561~0x2,0x1000053b565~0x1de,0x1000053b938~0x1fd,0x1000053bd2a~0x4,0x1000053bf23~0x4,0x1000053c11c~0x4,0x1000053cd7b~0x158,0x1000053ced8~0xffffac3128] > -1> 2023-04-20T10:25:15.072-0700 7f0465069700 -1 > /build/ceph-15.2.15/src/mds/journal.cc: In function 'void > EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)' thread > 7f0465069700 time 2023-04-20T10:25:15.076784-0700 > /build/ceph-15.2.15/src/mds/journal.cc: 1513: FAILED ceph_assert(inotablev > == mds->inotable->get_version()) > > ceph version 15.2.15 (2dfb18841cfecc2f7eb7eb2afd65986ca4d95985) octopus > (stable) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x155) [0x7f04717a3be1] > 2: (()+0x26ade9) [0x7f04717a3de9] > 3: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x67e2) > [0x560feaca36f2] > 4: (EUpdate::replay(MDSRank*)+0x42) [0x560feaca5bd2] > 5: (MDLog::_replay_thread()+0x90c) [0x560feac393ac] > 6: (MDLog::ReplayThread::entry()+0x11) [0x560fea920821] > 7: (()+0x8609) [0x7f0471318609] > 8: (clone()+0x43) [0x7f0470ee9163] > > 0> 2023-04-20T10:25:15.076-0700 7f0465069700 -1 *** Caught signal > (Aborted) ** > in thread 7f0465069700 thread_name:md_log_replay > > ceph version 15.2.15 (2dfb18841cfecc2f7eb7eb2afd65986ca4d95985) octopus > (stable) > 1: (()+0x143c0) [0x7f04713243c0] > 2: (gsignal()+0xcb) [0x7f0470e0d03b] > 3: (abort()+0x12b) [0x7f0470dec859] > 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x1b0) [0x7f04717a3c3c] > 5: (()+0x26ade9) [0x7f04717a3de9] > 6: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x67e2) > [0x560feaca36f2] > 7: (EUpdate::replay(MDSRank*)+0x42) [0x560feaca5bd2] > 8: (MDLog::_replay_thread()+0x90c) [0x560feac393ac] > 9: (MDLog::ReplayThread::entry()+0x11) [0x560fea920821] > 10: (()+0x8609) [0x7f0471318609] > 11: (clone()+0x43) [0x7f0470ee9163] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed > to interpret this. > > At this point, we decided it's best to ask for some guidance before > issuing any other recovery commands. > > Can anyone advise what we should do? > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx