Re: MDSs report damaged metadata

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We just had metadata damage show up on our Jewel cluster. I tried a few things like renaming directories and scanning, but the damage would just show up again in less than 24 hours. I finally just copied the directories with the damage to a tmp location on CephFS, then swapped it with the damaged one. When I deleted the directories with the damage the active MDS crashed, but the replay took over just fine. I haven't had the messages now for almost a week.
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Mon, Aug 19, 2019 at 10:30 PM Lars Täuber <taeuber@xxxxxxx> wrote:
Hi there!

Does anyone else have an idea what I could do to get rid of this error?

BTW: it is the third time that the pg 20.0 is gone inconsistent.
This is a pg from the metadata pool (cephfs).
May this be related anyhow?

# ceph health detail
HEALTH_ERR 1 MDSs report damaged metadata; 1 scrub errors; Possible data damage: 1 pg inconsistent
MDS_DAMAGE 1 MDSs report damaged metadata
    mdsmds3(mds.0): Metadata damage detected
OSD_SCRUB_ERRORS 1 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent
    pg 20.0 is active+clean+inconsistent, acting [9,27,15]


Best regards,
Lars


Mon, 19 Aug 2019 13:51:59 +0200
Lars Täuber <taeuber@xxxxxxx> ==> Paul Emmerich <paul.emmerich@xxxxxxxx> :
> Hi Paul,
>
> thanks for the hint.
>
> I did a recursive scrub from "/". The log says there where some inodes with bad backtraces repaired. But the error remains.
> May this have something to do with a deleted file? Or a file within a snapshot?
>
> The path told by
>
> # ceph tell mds.mds3 damage ls
> 2019-08-19 13:43:04.608 7f563f7f6700  0 client.894552 ms_handle_reset on v2:192.168.16.23:6800/176704036
> 2019-08-19 13:43:04.624 7f56407f8700  0 client.894558 ms_handle_reset on v2:192.168.16.23:6800/176704036
> [
>     {
>         "damage_type": "backtrace",
>         "id": 3760765989,
>         "ino": 1099518115802,
>         "path": "~mds0/stray7/100005161f7/dovecot.index.backup"
>     }
> ]
>
> starts a bit strange to me.
>
> Are the snapshots also repaired with a recursive repair operation?
>
> Thanks
> Lars
>
>
> Mon, 19 Aug 2019 13:30:53 +0200
> Paul Emmerich <paul.emmerich@xxxxxxxx> ==> Lars Täuber <taeuber@xxxxxxx> :
> > Hi,
> >
> > that error just says that the path is wrong. I unfortunately don't
> > know the correct way to instruct it to scrub a stray path off the top
> > of my head; you can always run a recursive scrub on / to go over
> > everything, though
> >
> >
> > Paul
> >   
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux