Re: repairing damaged cephfs_metadata pool

Gregory Farnum <gfarnum@xxxxxxxxxx> · Mon, 16 May 2022 06:20:07 -0700

On Tue, May 10, 2022 at 2:47 PM Horvath, Dustin Marshall
<dustinmhorvath@xxxxxx> wrote:
>
> Hi there, newcomer here.
>
> I've been trying to figure out if it's possible to repair or recover cephfs after some unfortunate issues a couple of months ago; these couple of nodes have been offline most of the time since the incident.
>
> I'm sure the problem is that I lack the ceph expertise to quite sus out where the broken bits are. This was a 2-node cluster (I know I know) that had a hypervisor primary disk fail, and the entire OS was lost. I reinstalled the hypervisor, rejoined it to the cluster (proxmox), rejoined ceph to the other node, re-added the OSDs. It came back with quorum problems and some PGs were inconsistent and some were lost. Some of that is due to my own fiddling around, which possibly exacerbated things. Eventually I had to edit the monmap down to 1 monitor, which had all kinds of screwy journal issues...it's been a while since I've tried resuscitating this, so the details in my memory are fuzzy.
>
> My cluster health isn't awful. Output is basically this:
> ```
> root@pve02:~# ceph -s
>   cluster:
>     id:     8b31840b-5706-4c92-8135-0d6e03976af1
>     health: HEALTH_ERR
>             1 filesystem is degraded
>             1 filesystem is offline
>             1 mds daemon damaged
>             noout flag(s) set
>             16 daemons have recently crashed
>
>   services:
>     mon: 1 daemons, quorum pve02 (age 3d)
>     mgr: pve01(active, since 4d)
>     mds: 0/1 daemons up
>     osd: 7 osds: 7 up (since 2d), 7 in (since 7w)
>          flags noout
>
>   data:
>    volumes: 0/1 healthy, 1 recovering; 1 damaged
>     pools:   5 pools, 576 pgs
>     objects: 1.51M objects, 4.0 TiB
>     usage:   8.2 TiB used, 9.1 TiB / 17 TiB avail
>     pgs:     575 active+clean
>              1   active+clean+scrubbing+deep
>
>   io:
>     client:   241 KiB/s wr, 0 op/s rd, 10 op/s wr
> ```
>
> I've tried a couple times running down the steps in here (https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/), but I always hit an error at scan_links, where I get a crash dump of sorts. If I try and mark the cephfs as repaired/joinable, MDS daemons will try and replay and then fail.

Yeah, that generally won't work until the process is fully complete —
otherwise the MDS starts hitting the metadata inconsistencies from
having a halfway-done FS!

> The only occurrences of err/ERR in the MDS logs are a line like this:
> ```
> 2022-05-07T18:31:26.342-0500 7f22b44d8700  1 mds.0.94  waiting for osdmap 301772 (which blocklists prior instance)
> 2022-05-07T18:31:26.346-0500 7f22adccb700 -1 log_channel(cluster) log [ERR] : failed to read JournalPointer: -1 ((1) Operation not permitted)
> 2022-05-07T18:31:26.346-0500 7f22af4ce700  0 mds.0.journaler.pq(ro) error getting journal off disk

That pretty much means the mds log/journal doesn't actually exist. I'm
actually surprised that this is the thing that causes crash since you
probably did the "cephfs-journal-tool --rank=0 journal reset" command
in that doc.

But as the page says, these are advanced tools which can wreck your
filesystem if you do them wrong, and the details matter. You'll have
to share as much as you can of what's been done to the cluster. Even
if you did some aborted recovery procedures, just running through it
again may work out. We'd need the scan_links error for certain,
though.
-Greg

> ```
>
> I haven't had much luck on the googles with diagnosing that error; seems uncommon. My hope is that the cephfs_data pool is fine. I actually never had any inconsistent PG issues on a pool other than the metadata pool, so that's the only one that suffered actual acute injury during the hardware failure/quorum loss.
> If I had more experience with the rados tools, I'd probably be more helpful. I have plenty of logs lying about and can perform any diagnoses that might help, but I hate to spam too much here right out of the gate.
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx