Re: MDS corrupt (also RADOS-level copy?)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jake,

Very interesting. This sounds very much like what we have been experiencing the last two days. We also had a sudden fill-up of the metadata pool, which repeated last night. See my question here: https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/7U27L27FHHPDYGA6VNNVWGLTXCGP7X23/

I also noticed that I couldn't dump the current journal using the cephfs-journal-tool, as it would eat up all my RAM (probably not surprising with a journal that seems to be filling up a 16TiB pool).

Note: I did NOT need to reset the journal (and you probably don't need to either). I did, however, have to add extra capacity and balance out the data. After an MDS restart, the pool quickly cleared out again. The first MDS restart took an hour or so and I had to increase the MDS lag timeout (mds_beacon_grace), otherwise the MONs kept killing the MDS during the resolve phase. I set it to 1600 to be on the safe side.

While your MDS are recovering, you may want to set debug_mds to 10 for one of your MDS and check the logs. My logs were being spammed with snapshot-related messages, but I cannot really make sense of them. Still hoping for a reply on the ML.

In any case, once you are recovered, I recommend you adjust the weights of some of your OSDs to be much lower than others as a temporary safeguard. This way, only some OSDs would fill up and trigger your FULL watermark should this thing repeat.

Janek


On 31/05/2023 16:13, Jake Grimmett wrote:
Dear All,

we are trying to recover from what we suspect is a corrupt MDS :(
and have been following the guide here:

<https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/>

Symptoms: MDS SSD pool (2TB) filled completely over the weekend, normally uses less than 400GB, resulting in MDS crash.

We added 4 x extra SSD to increase pool capacity to 3.5TB, however MDS did not recover

# ceph fs status
cephfs2 - 0 clients
=======
RANK   STATE     MDS     ACTIVITY   DNS    INOS   DIRS   CAPS
 0     failed
 1    resolve  wilma-s3            8065   8063   8047      0
 2    resolve  wilma-s2             901k   802k  34.4k     0
      POOL         TYPE     USED  AVAIL
    mds_ssd      metadata  2296G  3566G
primary_fs_data    data       0   3566G
    ec82pool       data    2168T  3557T
STANDBY MDS
  wilma-s1
  wilma-s4

setting "ceph mds repaired 0" causes rank 0 to restart, and then immediately fail.

Following the disaster-recovery-experts guide, the first step we did was to export the MDS journals, e.g:

# cephfs-journal-tool --rank=cephfs2:0 journal export /root/backup.bin.0
journal is 9744716714163~658103700
wrote 658103700 bytes at offset 9744716714163 to /root/backup.bin.0

so far so good, however when we try to backup the final MDS the process consumes all available RAM (470GB) and needs to be killed after 14 minutes.

# cephfs-journal-tool --rank=cephfs2:2 journal export /root/backup.bin.2

similarly, "recover_dentries summary" consumes all RAM when applied to MDS 2
# cephfs-journal-tool --rank=cephfs2:2 event recover_dentries summary

We successfully ran "cephfs-journal-tool --rank=cephfs2:0 event recover_dentries summary" and "cephfs-journal-tool --rank=cephfs2:1 event recover_dentries summary"

at this point, we tried to follow the instructions and make a RADOS level copy of the journal data, however the link in the docs doesn't explain how to do this and just points to <http://tracker.ceph.com/issues/9902>

At this point we are tempted to reset the journal on MDS 2, but wanted to get a feeling from others about how dangerous this could be?

We have a backup, but as there is 1.8PB of data, it's going to take a few weeks to restore....

any ideas gratefully received.

Jake


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux