repairing damaged cephfs_metadata pool

"Horvath, Dustin Marshall" <dustinmhorvath@xxxxxx> · Tue, 10 May 2022 21:46:14 +0000

Hi there, newcomer here.

I've been trying to figure out if it's possible to repair or recover cephfs after some unfortunate issues a couple of months ago; these couple of nodes have been offline most of the time since the incident.

I'm sure the problem is that I lack the ceph expertise to quite sus out where the broken bits are. This was a 2-node cluster (I know I know) that had a hypervisor primary disk fail, and the entire OS was lost. I reinstalled the hypervisor, rejoined it to the cluster (proxmox), rejoined ceph to the other node, re-added the OSDs. It came back with quorum problems and some PGs were inconsistent and some were lost. Some of that is due to my own fiddling around, which possibly exacerbated things. Eventually I had to edit the monmap down to 1 monitor, which had all kinds of screwy journal issues...it's been a while since I've tried resuscitating this, so the details in my memory are fuzzy.

My cluster health isn't awful. Output is basically this:
```
root@pve02:~# ceph -s
  cluster:
    id:     8b31840b-5706-4c92-8135-0d6e03976af1
    health: HEALTH_ERR
            1 filesystem is degraded
            1 filesystem is offline
            1 mds daemon damaged
            noout flag(s) set
            16 daemons have recently crashed

  services:
    mon: 1 daemons, quorum pve02 (age 3d)
    mgr: pve01(active, since 4d)
    mds: 0/1 daemons up
    osd: 7 osds: 7 up (since 2d), 7 in (since 7w)
         flags noout

  data:
   volumes: 0/1 healthy, 1 recovering; 1 damaged
    pools:   5 pools, 576 pgs
    objects: 1.51M objects, 4.0 TiB
    usage:   8.2 TiB used, 9.1 TiB / 17 TiB avail
    pgs:     575 active+clean
             1   active+clean+scrubbing+deep

  io:
    client:   241 KiB/s wr, 0 op/s rd, 10 op/s wr
```

I've tried a couple times running down the steps in here (https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/), but I always hit an error at scan_links, where I get a crash dump of sorts. If I try and mark the cephfs as repaired/joinable, MDS daemons will try and replay and then fail. The only occurrences of err/ERR in the MDS logs are a line like this:
```
2022-05-07T18:31:26.342-0500 7f22b44d8700  1 mds.0.94  waiting for osdmap 301772 (which blocklists prior instance)
2022-05-07T18:31:26.346-0500 7f22adccb700 -1 log_channel(cluster) log [ERR] : failed to read JournalPointer: -1 ((1) Operation not permitted)
2022-05-07T18:31:26.346-0500 7f22af4ce700  0 mds.0.journaler.pq(ro) error getting journal off disk
```

I haven't had much luck on the googles with diagnosing that error; seems uncommon. My hope is that the cephfs_data pool is fine. I actually never had any inconsistent PG issues on a pool other than the metadata pool, so that's the only one that suffered actual acute injury during the hardware failure/quorum loss.
If I had more experience with the rados tools, I'd probably be more helpful. I have plenty of logs lying about and can perform any diagnoses that might help, but I hate to spam too much here right out of the gate.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx