MDS recovery

jack@xxxxxxxxxxxxxxxxxxx · Thu, 20 Apr 2023 18:54:03 -0000

Hi All,

We have a CephFS cluster running Octopus with three control nodes each running an MDS, Monitor, and Manager on Ubuntu 20.04. The OS drive on one of these nodes failed recently and we had to do a fresh install, but made the mistake of installing Ubuntu 22.04 where Octopus is not available. We tried to force apt to use the Ubuntu 20.04 repo when installing Ceph so that it would install Octopus, but for some reason Quincy was still installed. We re-integrated this node and it seemed to work fine for about a week until our cluster reported damage to an MDS daemon and placed our filesystem into a degraded state.

cluster:
    id:     692905c0-f271-4cd8-9e43-1c32ef8abd13
    health: HEALTH_ERR
            mons are allowing insecure global_id reclaim
            1 filesystem is degraded
            1 filesystem is offline
            1 mds daemon damaged
            noout flag(s) set
            161 scrub errors
            Possible data damage: 24 pgs inconsistent
            8 pgs not deep-scrubbed in time
            4 pgs not scrubbed in time
            6 daemons have recently crashed

  services:
    mon: 3 daemons, quorum database-0,file-server,webhost (age 12d)
    mgr: database-0(active, since 4w), standbys: webhost, file-server
    mds: cephfs:0/1 3 up:standby, 1 damaged
    osd: 91 osds: 90 up (since 32h), 90 in (since 5M)
         flags noout

  task status:

  data:
    pools:   7 pools, 633 pgs
    objects: 169.18M objects, 640 TiB
    usage:   883 TiB used, 251 TiB / 1.1 PiB avail
    pgs:     605 active+clean
             23  active+clean+inconsistent
             4   active+clean+scrubbing+deep
             1   active+clean+scrubbing+deep+inconsistent

We are not sure if the Quincy/Octopus version mismatch is the problem, but we are in the process of downgrading this node now to ensure all nodes are running Octopus. Before doing that, we ran the following commands to try and recover:

$ cephfs-journal-tool --rank=cephfs:all journal export backup.bin

$ sudo cephfs-journal-tool --rank=cephfs:all event recover_dentries summary:

Events by type:
  OPEN: 29589
  PURGED: 1
  SESSION: 16
  SESSIONS: 4
  SUBTREEMAP: 127
  UPDATE: 70438
Errors: 0

$ cephfs-journal-tool --rank=cephfs:0 journal reset:

old journal was 170234219175~232148677
new journal start will be 170469097472 (2729620 bytes past old end)
writing journal head
writing EResetJournal entry
done

$ cephfs-table-tool all reset session

All of our MDS daemons are down and fail to restart with the following errors:

-3> 2023-04-20T10:25:15.072-0700 7f0465069700 -1 log_channel(cluster) log [ERR] : journal replay alloc 0x1000053af79 not in free [0x1000053af7d~0x1e8,0x1000053b35c~0x1f7,0x1000053b555~0x2,0x1000053b559~0x2,0x1000053b55d~0x2,0x1000053b561~0x2,0x1000053b565~0x1de,0x1000053b938~0x1fd,0x1000053bd2a~0x4,0x1000053bf23~0x4,0x1000053c11c~0x4,0x1000053cd7b~0x158,0x1000053ced8~0xffffac3128]
    -2> 2023-04-20T10:25:15.072-0700 7f0465069700 -1 log_channel(cluster) log [ERR] : journal replay alloc [0x1000053af7a~0x1eb,0x1000053b35c~0x1f7,0x1000053b555~0x2,0x1000053b559~0x2,0x1000053b55d~0x2], only [0x1000053af7d~0x1e8,0x1000053b35c~0x1f7,0x1000053b555~0x2,0x1000053b559~0x2,0x1000053b55d~0x2] is in free [0x1000053af7d~0x1e8,0x1000053b35c~0x1f7,0x1000053b555~0x2,0x1000053b559~0x2,0x1000053b55d~0x2,0x1000053b561~0x2,0x1000053b565~0x1de,0x1000053b938~0x1fd,0x1000053bd2a~0x4,0x1000053bf23~0x4,0x1000053c11c~0x4,0x1000053cd7b~0x158,0x1000053ced8~0xffffac3128]
    -1> 2023-04-20T10:25:15.072-0700 7f0465069700 -1 /build/ceph-15.2.15/src/mds/journal.cc: In function 'void EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)' thread 7f0465069700 time 2023-04-20T10:25:15.076784-0700
/build/ceph-15.2.15/src/mds/journal.cc: 1513: FAILED ceph_assert(inotablev == mds->inotable->get_version())

 ceph version 15.2.15 (2dfb18841cfecc2f7eb7eb2afd65986ca4d95985) octopus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x155) [0x7f04717a3be1]
 2: (()+0x26ade9) [0x7f04717a3de9]
 3: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x67e2) [0x560feaca36f2]
 4: (EUpdate::replay(MDSRank*)+0x42) [0x560feaca5bd2]
 5: (MDLog::_replay_thread()+0x90c) [0x560feac393ac]
 6: (MDLog::ReplayThread::entry()+0x11) [0x560fea920821]
 7: (()+0x8609) [0x7f0471318609]
 8: (clone()+0x43) [0x7f0470ee9163]

     0> 2023-04-20T10:25:15.076-0700 7f0465069700 -1 *** Caught signal (Aborted) **
 in thread 7f0465069700 thread_name:md_log_replay

 ceph version 15.2.15 (2dfb18841cfecc2f7eb7eb2afd65986ca4d95985) octopus (stable)
 1: (()+0x143c0) [0x7f04713243c0]
 2: (gsignal()+0xcb) [0x7f0470e0d03b]
 3: (abort()+0x12b) [0x7f0470dec859]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1b0) [0x7f04717a3c3c]
 5: (()+0x26ade9) [0x7f04717a3de9]
 6: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x67e2) [0x560feaca36f2]
 7: (EUpdate::replay(MDSRank*)+0x42) [0x560feaca5bd2]
 8: (MDLog::_replay_thread()+0x90c) [0x560feac393ac]
 9: (MDLog::ReplayThread::entry()+0x11) [0x560fea920821]
 10: (()+0x8609) [0x7f0471318609]
 11: (clone()+0x43) [0x7f0470ee9163]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

At this point, we decided it's best to ask for some guidance before issuing any other recovery commands.

Can anyone advise what we should do?
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx