Re: MDS_DAMAGE dir_frag

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> Op 12 dec. 2022 om 22:47 heeft Sascha Lucas <ceph-users@xxxxxxxxx> het volgende geschreven:
> 
> Hi Greg,
> 
>> On Mon, 12 Dec 2022, Gregory Farnum wrote:
>> 
>> On Mon, Dec 12, 2022 at 12:10 PM Sascha Lucas <ceph-users@xxxxxxxxx> wrote:
> 
>>> A follow-up of [2] also mentioned having random meta-data corruption: "We
>>> have 4 clusters (all running same version) and have experienced meta-data
>>> corruption on the majority of them at some time or the other"
>> 
>> 
>> Jewel (and upgrading from that version) was much less stable than Luminous
>> (when we declared the filesystem “awesome” and said the Ceph upstream
>> considered it production-ready), and things have generally gotten better
>> with every release since then.
> 
> I see. The cited corruption belongs to older releases...
> 
>>> [3] tells me, that metadata damage can happen either from data loss (which
>>> I'm convinced not to have), or from software bugs. The later would be
>>> worth fixing. Is there a way to find the root cause?
>> 
>> 
>> Yes, we’d very much like to understand this. What versions of the server
>> and kernel client are you using? What platform stack — I see it looks like
>> you are using CephFS through the volumes interface? The simplest
>> possibility I can think of here is that you are running with a bad kernel
>> and it used async ops poorly, maybe? But I don’t remember other spontaneous
>> corruptions of this type anytime recent.
> 
> Ceph "servers" like MONs, OSDs, MDSs etc. are all 17.2.5/cephadm/podman. The filesystem kernel clients are co-located on the same hosts running the "servers".

Isn’t that discouraged?

> For some other reason OS is still RHEL 8.5 (yes with community ceph). Kernel is 4.18.0-348.el8.x86_64 from release media. Just one filesystem kernel client is at 4.18.0-348.23.1.el8_5.x86_64 from EOL of 8.5.
> 
> Are there known issues with this kernel versions?
> 
>> Have you run a normal forward scrub (which is non-disruptive) to check if
>> there are other issues?
> 
> So far I haven't dared, but will do so tomorrow.
> 
> Thanks, Sascha.
> 
> [2] https://www.spinics.net/lists/ceph-users/msg53202.html
> [3] https://docs.ceph.com/en/quincy/cephfs/disaster-recovery/#metadata-damage-and-repair
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux