Re: One mds daemon damaged, filesystem is offline. How to recover?

Lokendra Rathour <lokendrarathour@xxxxxxxxx> · Sun, 23 May 2021 12:18:06 +0530

Also what is the status of you other mds  ?
Is it active ? Or which one was damaged?
U can also look for additional mds in the same cluster possibility?

On Sat, 22 May 2021, 00:40 Eugen Block, <eblock@xxxxxx> wrote:

> Hi,
>
> I went through similar trouble just this week [1], but the root cause
> seems different so it probably won't apply to your case.
> Which version of ceph are you running? There are a couple of reports
> with similar error messages, e. g. [2], it may already been resolved.
>
> Can you share
>
> rados list-inconsistent-obj 2.44
>
> and
>
> ceph tell mds.<MDS> damage ls
>
> The pool size is 3, right?
>
> Regards,
> Eugen
>
> Zitat von Sagara Wijetunga <sagarawmw@xxxxxxxxx>:
>
> > Hi all
> > An accidental power failure happened.
> > That resulted CephFS offline and cannot be mounted.
> > I have 3 MDS daemons but it complains "1 mds daemon damaged".
> >
> > It seems a PG of cephfs_metadata is inconsistent. I tried to repair,
> > but doesn't get it repaired.
> > How do I repair the damaged MDS and bring the CephFS up/online?
> > Details are included below.
> >
> > Many thanks in advance.
> > Sagara
> >
> >
> >
> >
> > # ceph -s
> >
> >   cluster:
> >
> >     id:     abc...
> >
> >     health: HEALTH_ERR
> >
> >             1 filesystem is degraded
> >
> >             1 filesystem is offline
> >
> >             1 mds daemon damaged
> >
> >             4 scrub errors
> >
> >             Possible data damage: 1 pg inconsistent
> >
> >
> >
> >   services:
> >
> >     mon: 3 daemons, quorum a,b,c (age 107s)
> >
> >     mgr: a(active, since 22m), standbys: b, c
> >
> >     mds: cephfs:0/1 3 up:standby, 1 damaged
> >
> >     osd: 3 osds: 3 up (since 96s), 3 in (since 96s)
> >
> >
> >
> >   data:
> >
> >     pools:   3 pools, 192 pgs
> >
> >     objects: 281.05k objects, 327 GiB
> >
> >     usage:   2.4 TiB used, 8.1 TiB / 11 TiB avail
> >
> >     pgs:     191 active+clean
> >
> >              1   active+clean+inconsistent
> >
> >
> >
> > # ceph health detail
> >
> > HEALTH_ERR 1 filesystem is degraded; 1 filesystem is offline; 1 mds
> > daemon damaged; 4 scrub errors; Possible data damage: 1 pg
> > inconsistent
> >
> > FS_DEGRADED 1 filesystem is degraded
> >
> >     fs cephfs is degraded
> >
> > MDS_ALL_DOWN 1 filesystem is offline
> >
> >     fs cephfs is offline because no MDS is active for it.
> >
> > MDS_DAMAGE 1 mds daemon damaged
> >
> >     fs cephfs mds.0 is damaged
> >
> > OSD_SCRUB_ERRORS 4 scrub errors
> >
> > PG_DAMAGED Possible data damage: 1 pg inconsistent
> >
> >     pg 2.44 is active+clean+inconsistent, acting [0,2,1]
> >
> >
> >
> > # ceph osd lspools
> >
> > 2 cephfs_metadata
> >
> > 3 cephfs_data
> >
> > 4 rbd
> >
> >
> >
> >
> > # ceph pg repair 2.44
> >
> >
> > # ceph -w
> >
> > 2021-05-22 01:48:04.775783 osd.0 [ERR] 2.44 shard 0 soid
> > 2:22efaf6a:::200.00006048:head : candidate size 1540096 info size
> > 1555896 mismatch
> >
> >
> > 2021-05-22 01:48:04.775786 osd.0 [ERR] 2.44 shard 1 soid
> > 2:22efaf6a:::200.00006048:head : candidate size 1540096 info size
> > 1555896 mismatch
> >
> >
> > 2021-05-22 01:48:04.775787 osd.0 [ERR] 2.44 shard 2 soid
> > 2:22efaf6a:::200.00006048:head : candidate size 1441792 info size
> > 1555896 mismatch
> >
> >
> > 2021-05-22 01:48:04.775789 osd.0 [ERR] 2.44 soid
> > 2:22efaf6a:::200.00006048:head : failed to pick suitable object info
> >
> > 2021-05-22 01:48:04.775849 osd.0 [ERR] repair 2.44
> > 2:22efaf6a:::200.00006048:head : on disk size (1540096) does not
> > match object info size (1555896) adjusted for ondisk to (1555896)
> >
> > 2021-05-22 01:48:04.787167 osd.0 [ERR] 2.44 repair 4 errors, 0 fixed
> >
> > --- End of detail ---
> >
> >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx