Re: MDS "newly corrupt dentry" after patch version upgrade

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 2, 2023 at 10:31 AM Janek Bevendorff
<janek.bevendorff@xxxxxxxxxxxxx> wrote:
>
> Hi,
>
> After a patch version upgrade from 16.2.10 to 16.2.12, our rank 0 MDS
> fails start start. After replaying the journal, it just crashes with
>
> [ERR] : MDS abort because newly corrupt dentry to be committed: [dentry
> #0x1/storage [2,head] auth (dversion lock)
>
> Immediately after the upgrade, I had it running shortly, but then it
> decided to crash for unknown reasons and I cannot get it back up.
>
> We have five ranks in total, the other four seem to be fine. I backed up
> the journal and tried to run cephfs-journal-tool --rank=cephfs.storage:0
> event recover_dentries summary, but it never finishes only eats up a lot
> of RAM. I stopped it after an hour and 50GB RAM.
>
> Resetting the journal makes the MDS crash with a missing inode error on
> another top-level directory, so I re-imported the backed-up journal. Is
> there any way to recover from this without rebuilding the whole file system?

Please be careful resetting the journal. It was not necessary. You can
try to recover the missing inode using cephfs-data-scan [2].

Thanks for the report. Unfortunately this looks like a false positive.
You're not using snapshots, right?

In any case, if you can reproduce it again with:

> ceph config mds debug_mds 20
> ceph config mds debug_ms 1

and upload the logs using ceph-post-file [1], that would be helpful to
understand what happened.

After that you can disable the check as Dan pointed out:

ceph config set mds mds_abort_on_newly_corrupt_dentry false
ceph config set mds mds_go_bad_corrupt_dentry false

NOTE FOR OTHER READERS OF THIS MAIL: it is not recommended to blindly
set these configs as the MDS is trying to catch legitimate metadata
corruption.

[1] https://docs.ceph.com/en/quincy/man/8/ceph-post-file/
[2] https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux