Re: MDSs report damaged metadata

Robert LeBlanc <robert@xxxxxxxxxxxxx> · Thu, 22 Aug 2019 07:38:05 -0700

We just had metadata damage show up on our Jewel cluster. I tried a few things like renaming directories and scanning, but the damage would just show up again in less than 24 hours. I finally just copied the directories with the damage to a tmp location on CephFS, then swapped it with the damaged one. When I deleted the directories with the damage the active MDS crashed, but the replay took over just fine. I haven't had the messages now for almost a week.
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

On Mon, Aug 19, 2019 at 10:30 PM Lars Täuber <taeuber@xxxxxxx> wrote:
Hi there!

Does anyone else have an idea what I could do to get rid of this error?

BTW: it is the third time that the pg 20.0 is gone inconsistent.

This is a pg from the metadata pool (cephfs).

May this be related anyhow?

# ceph health detail

HEALTH_ERR 1 MDSs report damaged metadata; 1 scrub errors; Possible data damage: 1 pg inconsistent

MDS_DAMAGE 1 MDSs report damaged metadata

    mdsmds3(mds.0): Metadata damage detected

OSD_SCRUB_ERRORS 1 scrub errors

PG_DAMAGED Possible data damage: 1 pg inconsistent

    pg 20.0 is active+clean+inconsistent, acting [9,27,15]

Best regards,

Lars

Mon, 19 Aug 2019 13:51:59 +0200

Lars Täuber <taeuber@xxxxxxx> ==> Paul Emmerich <paul.emmerich@xxxxxxxx> :

> Hi Paul,

> 

> thanks for the hint.

> 

> I did a recursive scrub from "/". The log says there where some inodes with bad backtraces repaired. But the error remains.

> May this have something to do with a deleted file? Or a file within a snapshot?

> 

> The path told by

> 

> # ceph tell mds.mds3 damage ls

> 2019-08-19 13:43:04.608 7f563f7f6700  0 client.894552 ms_handle_reset on v2:192.168.16.23:6800/176704036

> 2019-08-19 13:43:04.624 7f56407f8700  0 client.894558 ms_handle_reset on v2:192.168.16.23:6800/176704036

> [

>     {

>         "damage_type": "backtrace",

>         "id": 3760765989,

>         "ino": 1099518115802,

>         "path": "~mds0/stray7/100005161f7/dovecot.index.backup"

>     }

> ]

> 

> starts a bit strange to me.

> 

> Are the snapshots also repaired with a recursive repair operation?

> 

> Thanks

> Lars

> 

> 

> Mon, 19 Aug 2019 13:30:53 +0200

> Paul Emmerich <paul.emmerich@xxxxxxxx> ==> Lars Täuber <taeuber@xxxxxxx> :

> > Hi,

> > 

> > that error just says that the path is wrong. I unfortunately don't

> > know the correct way to instruct it to scrub a stray path off the top

> > of my head; you can always run a recursive scrub on / to go over

> > everything, though

> > 

> > 

> > Paul

> >   

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com