Re: One mds daemon damaged, filesystem is offline. How to recover?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sorry, above post has to be corrected as:"Out of the info now emerged so far seems Ceph client wanted to write an object of size 1555896 but managed to write only 1540096 bytes to the journal."

Yes, I would think so, too.

I think what we need to do now is:
1. Get the MDS.0 recover, discard if necessary part of the object 200.00006048 and bring the MSD.0 up.

Yes, I agree, I just can't tell what the best way is here, maybe remove all three objects from the disks (make a backup before doing that, just in case) and try the steps to recover the journal (also make a backup of the journal first):

mds01:~ # systemctl stop ceph-mds@mds01.service
mds01:~ # cephfs-journal-tool journal export myjournal.bin
mds01:~ # cephfs-journal-tool --rank=cephfs:0 event recover_dentries summary
mds01:~ # cephfs-journal-tool --rank=cephfs:0 journal reset
mds01:~ # cephfs-table-tool all reset session
mds01:~ # systemctl start ceph-mds(a)mds01.service
mds01:~ # ceph mds repaired 0
mds01:~ # ceph daemon mds.mds01 scrub_path / recursive repair

2. Do the same recovery for the MSD.1 as in step 1 and bring MDS.1 also up.

If step 1 succeeds the standby daemons will most likely also start successfully.

Zitat von Sagara Wijetunga <sagarawmw@xxxxxxxxx>:

Sorry, above post has to be corrected as:"Out of the info now emerged so far seems Ceph client wanted to write an object of size 1555896 but managed to write only  1540096 bytes to the journal."

Sagara
On Saturday, May 22, 2021, 08:29:34 PM GMT+8, Sagara Wijetunga <sagarawmw@xxxxxxxxx> wrote:

Out of the info now emerged so far seems Ceph client wanted to write an object of size 1555896 but managed to write only 1555896 bytes to the journal. I think what we need to do now is:1. Get the MDS.0 recover, discard if necessary part of the object 200.00006048 and bring the MSD.0 up.
2. Do the same recovery for the MSD.1 as in step 1 and bring MDS.1 also up.
3. Above two steps to the most probability may bring CephFS up.
4. Once the CephFS is up, scan for corrupted files, remove them and bring from backup. 5. Get the MDS.2 to sync to MSD.0 or 1 and bring the cluster to sync'ed stage.

My question is, what exactly necessary to carry above step 1?
Sagara


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux