Re: One mds daemon damaged, filesystem is offline. How to recover?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I went through similar trouble just this week [1], but the root cause seems different so it probably won't apply to your case. Which version of ceph are you running? There are a couple of reports with similar error messages, e. g. [2], it may already been resolved.

Can you share

rados list-inconsistent-obj 2.44

and

ceph tell mds.<MDS> damage ls

The pool size is 3, right?

Regards,
Eugen

Zitat von Sagara Wijetunga <sagarawmw@xxxxxxxxx>:

Hi all
An accidental power failure happened.
That resulted CephFS offline and cannot be mounted.
I have 3 MDS daemons but it complains "1 mds daemon damaged".

It seems a PG of cephfs_metadata is inconsistent. I tried to repair, but doesn't get it repaired.
How do I repair the damaged MDS and bring the CephFS up/online?
Details are included below.

Many thanks in advance.
Sagara




# ceph -s

  cluster:

    id:     abc...

    health: HEALTH_ERR

            1 filesystem is degraded

            1 filesystem is offline

            1 mds daemon damaged

            4 scrub errors

            Possible data damage: 1 pg inconsistent

 

  services:

    mon: 3 daemons, quorum a,b,c (age 107s)

    mgr: a(active, since 22m), standbys: b, c

    mds: cephfs:0/1 3 up:standby, 1 damaged

    osd: 3 osds: 3 up (since 96s), 3 in (since 96s)

 

  data:

    pools:   3 pools, 192 pgs

    objects: 281.05k objects, 327 GiB

    usage:   2.4 TiB used, 8.1 TiB / 11 TiB avail

    pgs:     191 active+clean

             1   active+clean+inconsistent



# ceph health detail

HEALTH_ERR 1 filesystem is degraded; 1 filesystem is offline; 1 mds daemon damaged; 4 scrub errors; Possible data damage: 1 pg inconsistent

FS_DEGRADED 1 filesystem is degraded

    fs cephfs is degraded

MDS_ALL_DOWN 1 filesystem is offline

    fs cephfs is offline because no MDS is active for it.

MDS_DAMAGE 1 mds daemon damaged

    fs cephfs mds.0 is damaged

OSD_SCRUB_ERRORS 4 scrub errors

PG_DAMAGED Possible data damage: 1 pg inconsistent

    pg 2.44 is active+clean+inconsistent, acting [0,2,1]



# ceph osd lspools

2 cephfs_metadata

3 cephfs_data

4 rbd




# ceph pg repair 2.44


# ceph -w

2021-05-22 01:48:04.775783 osd.0 [ERR] 2.44 shard 0 soid 2:22efaf6a:::200.00006048:head : candidate size 1540096 info size 1555896 mismatch


2021-05-22 01:48:04.775786 osd.0 [ERR] 2.44 shard 1 soid 2:22efaf6a:::200.00006048:head : candidate size 1540096 info size 1555896 mismatch


2021-05-22 01:48:04.775787 osd.0 [ERR] 2.44 shard 2 soid 2:22efaf6a:::200.00006048:head : candidate size 1441792 info size 1555896 mismatch


2021-05-22 01:48:04.775789 osd.0 [ERR] 2.44 soid 2:22efaf6a:::200.00006048:head : failed to pick suitable object info

2021-05-22 01:48:04.775849 osd.0 [ERR] repair 2.44 2:22efaf6a:::200.00006048:head : on disk size (1540096) does not match object info size (1555896) adjusted for ondisk to (1555896)

2021-05-22 01:48:04.787167 osd.0 [ERR] 2.44 repair 4 errors, 0 fixed

--- End of detail ---


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux