Re: MDS_DAMAGE dir_frag

Gregory Farnum <gfarnum@xxxxxxxxxx> · Mon, 12 Dec 2022 12:28:58 -0800

On Mon, Dec 12, 2022 at 12:10 PM Sascha Lucas <ceph-users@xxxxxxxxx> wrote:

> Hi Dhairya,
>
> On Mon, 12 Dec 2022, Dhairya Parmar wrote:
>
> > You might want to look at [1] for this, also I found a relevant thread
> [2]
> > that could be helpful.
> >
>
> Thanks a lot. I already found [1,2], too. But I did not considered it,
> because I felt not having a "disaster"? Nothing seems broken nor crashed:
> all servers/services up since weeks. No disk failures, no modifications on
> cluster etc.
>
> Also the Warning Box in [1] tells me (as a newbie) not to run anything of
> this. Or in other words: not to forcefully start a disaster ;-).
>
> A follow-up of [2] also mentioned having random meta-data corruption: "We
> have 4 clusters (all running same version) and have experienced meta-data
> corruption on the majority of them at some time or the other"

Jewel (and upgrading from that version) was much less stable than Luminous
(when we declared the filesystem “awesome” and said the Ceph upstream
considered it production-ready), and things have generally gotten better
with every release since then.

>
> [3] tells me, that metadata damage can happen either from data loss (which
> I'm convinced not to have), or from software bugs. The later would be
> worth fixing. Is there a way to find the root cause?

Yes, we’d very much like to understand this. What versions of the server
and kernel client are you using? What platform stack — I see it looks like
you are using CephFS through the volumes interface? The simplest
possibility I can think of here is that you are running with a bad kernel
and it used async ops poorly, maybe? But I don’t remember other spontaneous
corruptions of this type anytime recent.

Have you run a normal forward scrub (which is non-disruptive) to check if
there are other issues?
-Greg

>
> And is going through [1] relay the only option? It sounds being offline
> for days...
>
> At least I know now, what dirfrags[4] are.
>
> Thanks, Sascha.
>
> [1]
> https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/#disaster-recovery-experts
> [2] https://www.spinics.net/lists/ceph-users/msg53202.html
> [3]
> https://docs.ceph.com/en/quincy/cephfs/disaster-recovery/#metadata-damage-and-repair
> [4] https://docs.ceph.com/en/quincy/cephfs/dirfrags/
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx