We just had metadata damage show up on our Jewel cluster. I tried a few things like renaming directories and scanning, but the damage would just show up again in less than 24 hours. I finally just copied the directories with the damage to a tmp location on CephFS, then swapped it with the damaged one. When I deleted the directories with the damage the active MDS crashed, but the replay took over just fine. I haven't had the messages now for almost a week.
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
On Mon, Aug 19, 2019 at 10:30 PM Lars Täuber <taeuber@xxxxxxx> wrote:
Hi there!
Does anyone else have an idea what I could do to get rid of this error?
BTW: it is the third time that the pg 20.0 is gone inconsistent.
This is a pg from the metadata pool (cephfs).
May this be related anyhow?
# ceph health detail
HEALTH_ERR 1 MDSs report damaged metadata; 1 scrub errors; Possible data damage: 1 pg inconsistent
MDS_DAMAGE 1 MDSs report damaged metadata
mdsmds3(mds.0): Metadata damage detected
OSD_SCRUB_ERRORS 1 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent
pg 20.0 is active+clean+inconsistent, acting [9,27,15]
Best regards,
Lars
Mon, 19 Aug 2019 13:51:59 +0200
Lars Täuber <taeuber@xxxxxxx> ==> Paul Emmerich <paul.emmerich@xxxxxxxx> :
> Hi Paul,
>
> thanks for the hint.
>
> I did a recursive scrub from "/". The log says there where some inodes with bad backtraces repaired. But the error remains.
> May this have something to do with a deleted file? Or a file within a snapshot?
>
> The path told by
>
> # ceph tell mds.mds3 damage ls
> 2019-08-19 13:43:04.608 7f563f7f6700 0 client.894552 ms_handle_reset on v2:192.168.16.23:6800/176704036
> 2019-08-19 13:43:04.624 7f56407f8700 0 client.894558 ms_handle_reset on v2:192.168.16.23:6800/176704036
> [
> {
> "damage_type": "backtrace",
> "id": 3760765989,
> "ino": 1099518115802,
> "path": "~mds0/stray7/100005161f7/dovecot.index.backup"
> }
> ]
>
> starts a bit strange to me.
>
> Are the snapshots also repaired with a recursive repair operation?
>
> Thanks
> Lars
>
>
> Mon, 19 Aug 2019 13:30:53 +0200
> Paul Emmerich <paul.emmerich@xxxxxxxx> ==> Lars Täuber <taeuber@xxxxxxx> :
> > Hi,
> >
> > that error just says that the path is wrong. I unfortunately don't
> > know the correct way to instruct it to scrub a stray path off the top
> > of my head; you can always run a recursive scrub on / to go over
> > everything, though
> >
> >
> > Paul
> >
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com