Re: One mds daemon damaged, filesystem is offline. How to recover?

Eugen Block <eblock@xxxxxx> · Fri, 21 May 2021 19:09:59 +0000

Hi,

I went through similar trouble just this week [1], but the root cause  
seems different so it probably won't apply to your case.
Which version of ceph are you running? There are a couple of reports  
with similar error messages, e. g. [2], it may already been resolved.

Can you share

rados list-inconsistent-obj 2.44

and

ceph tell mds.<MDS> damage ls

The pool size is 3, right?

Regards,
Eugen

Zitat von Sagara Wijetunga <sagarawmw@xxxxxxxxx>:

Hi all
An accidental power failure happened.
That resulted CephFS offline and cannot be mounted.
I have 3 MDS daemons but it complains "1 mds daemon damaged".

It seems a PG of cephfs_metadata is inconsistent. I tried to repair,  
but doesn't get it repaired.
How do I repair the damaged MDS and bring the CephFS up/online?
Details are included below.

Many thanks in advance.
Sagara

# ceph -s

  cluster:

    id:     abc...

    health: HEALTH_ERR

            1 filesystem is degraded

            1 filesystem is offline

            1 mds daemon damaged

            4 scrub errors

            Possible data damage: 1 pg inconsistent

  services:

    mon: 3 daemons, quorum a,b,c (age 107s)

    mgr: a(active, since 22m), standbys: b, c

    mds: cephfs:0/1 3 up:standby, 1 damaged

    osd: 3 osds: 3 up (since 96s), 3 in (since 96s)

  data:

    pools:   3 pools, 192 pgs

    objects: 281.05k objects, 327 GiB

    usage:   2.4 TiB used, 8.1 TiB / 11 TiB avail

    pgs:     191 active+clean

             1   active+clean+inconsistent

# ceph health detail

HEALTH_ERR 1 filesystem is degraded; 1 filesystem is offline; 1 mds  
daemon damaged; 4 scrub errors; Possible data damage: 1 pg  
inconsistent

FS_DEGRADED 1 filesystem is degraded

    fs cephfs is degraded

MDS_ALL_DOWN 1 filesystem is offline

    fs cephfs is offline because no MDS is active for it.

MDS_DAMAGE 1 mds daemon damaged

    fs cephfs mds.0 is damaged

OSD_SCRUB_ERRORS 4 scrub errors

PG_DAMAGED Possible data damage: 1 pg inconsistent

    pg 2.44 is active+clean+inconsistent, acting [0,2,1]

# ceph osd lspools

2 cephfs_metadata

3 cephfs_data

4 rbd

# ceph pg repair 2.44

# ceph -w

2021-05-22 01:48:04.775783 osd.0 [ERR] 2.44 shard 0 soid  
2:22efaf6a:::200.00006048:head : candidate size 1540096 info size  
1555896 mismatch

2021-05-22 01:48:04.775786 osd.0 [ERR] 2.44 shard 1 soid  
2:22efaf6a:::200.00006048:head : candidate size 1540096 info size  
1555896 mismatch

2021-05-22 01:48:04.775787 osd.0 [ERR] 2.44 shard 2 soid  
2:22efaf6a:::200.00006048:head : candidate size 1441792 info size  
1555896 mismatch

2021-05-22 01:48:04.775789 osd.0 [ERR] 2.44 soid  
2:22efaf6a:::200.00006048:head : failed to pick suitable object info

2021-05-22 01:48:04.775849 osd.0 [ERR] repair 2.44  
2:22efaf6a:::200.00006048:head : on disk size (1540096) does not  
match object info size (1555896) adjusted for ondisk to (1555896)

2021-05-22 01:48:04.787167 osd.0 [ERR] 2.44 repair 4 errors, 0 fixed

--- End of detail ---

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx