I've had similar problem twice (with mimic) and in both cases I ended up backing up and restoring to a fresh fs. Did you do MDS scrub after recovery? My experience insists that recovering dup inodes is not a trivial process: my MDS kept crashing on unlink() in some directories, and in other case newly created fs entries would not pass MDS scrub due to linkage errors. May 17, 2019 3:40 PM, "Adam Tygart" <mozes@xxxxxxx> wrote: > I followed the docs from here: > http://docs.ceph.com/docs/nautilus/cephfs/disaster-recovery-experts/#disaster-recovery-experts > > I exported the journals as a backup for both ranks. I was running 2 > active MDS daemons at the time. > > cephfs-journal-tool --rank=combined:0 journal export > cephfs-journal-0-201905161412.bin > cephfs-journal-tool --rank=combined:1 journal export > cephfs-journal-1-201905161412.bin > > I recovered the Dentries on both ranks > cephfs-journal-tool --rank=combined:0 event recover_dentries summary > cephfs-journal-tool --rank=combined:1 event recover_dentries summary > > I reset the journals of both ranks: > cephfs-journal-tool --rank=combined:1 journal reset > cephfs-journal-tool --rank=combined:0 journal reset > > Then I reset the session table > cephfs-table-tool all reset session > > Once that was done, reboot all machines that were talking to cephfs > (or at least unmount/remount). > > On Fri, May 17, 2019 at 2:30 AM <wangzhigang@xxxxxxxxxxx> wrote: > >> Hi >> Can you tell me the detail recovery cmd ? >> >> I just started learning cephfs ,I would be grateful. >> >> 发件人: Adam Tygart <mozes@xxxxxxx> >> 收件人: Ceph Users <ceph-users@xxxxxxxxxxxxxx> >> 日期: 2019/05/17 09:04 >> 主题: [lists.ceph.com代发]Re: MDS Crashing 14.2.1 >> 发件人: "ceph-users" <ceph-users-bounces@xxxxxxxxxxxxxx> >> ________________________________ >> >> I ended up backing up the journals of the MDS ranks, recover_dentries for both of them, resetting >> the journals and session table. It is back up. The recover dentries stage didn't show any errors, >> so I'm not even sure why the MDS was asserting about duplicate inodes. >> >> -- >> Adam >> >> On Thu, May 16, 2019, 13:52 Adam Tygart <mozes@xxxxxxx> wrote: >> Hello all, >> >> The rank 0 mds is still asserting. Is this duplicate inode situation >> one that I should be considering using the cephfs-journal-tool to >> export, recover dentries and reset? >> >> Thanks, >> Adam >> >> On Thu, May 16, 2019 at 12:51 AM Adam Tygart <mozes@xxxxxxx> wrote: >> >> Hello all, >> >> I've got a 30 node cluster serving up lots of CephFS data. >> >> We upgraded to Nautilus 14.2.1 from Luminous 12.2.11 on Monday earlier >> this week. >> >> We've been running 2 MDS daemons in an active-active setup. Tonight >> one of the metadata daemons crashed with the following several times: >> >> -1> 2019-05-16 00:20:56.775 7f9f22405700 -1 >> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/cent >> s7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.1/rpm/el7/BUILD/ceph-14.2.1/src/mds/CInode.h: >> In function 'void CIn >> ode::set_primary_parent(CDentry*)' thread 7f9f22405700 time 2019-05-16 >> 00:20:56.775021 >> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/cent >> s7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.1/rpm/el7/BUILD/ceph-14.2.1/src/mds/CInode.h: >> 1114: FAILED ceph_assert(parent == 0 || g_conf().get_val<bool>("mds_h >> ack_allow_loading_invalid_metadata")) >> >> I made a quick decision to move to a single MDS because I saw >> set_primary_parent, and I thought it might be related to auto >> balancing between the metadata servers. >> >> This caused one MDS to fail, the other crashed, and now rank 0 loads, >> goes active and then crashes with the following: >> -1> 2019-05-16 00:29:21.151 7fe315e8d700 -1 >> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/cent >> s7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.1/rpm/el7/BUILD/ceph-14.2.1/src/mds/MDCache.cc: >> In function 'void M >> DCache::add_inode(CInode*)' thread 7fe315e8d700 time 2019-05-16 00:29:21.149531 >> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/cent >> s7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.1/rpm/el7/BUILD/ceph-14.2.1/src/mds/MDCache.cc: >> 258: FAILED ceph_assert(!p) >> >> It now looks like we somehow have a duplicate inode in the MDS journal? >> >> https://people.cs.ksu.edu/~mozes/ceph-mds.melinoe.log <- was rank 0 >> then became rank one after the crash and attempted drop to one active >> MDS >> https://people.cs.ksu.edu/~mozes/ceph-mds.mormo.log <- current rank 0 >> and crashed >> >> Anyone have any thoughts on this? >> >> Thanks, >> Adam >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com______________________________________________ >> >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> ---------------------------------------------------------------------------------------------------- >> ---------------------------------------------------------------------------- >> 本邮件及其附件含有浙江宇视科技有限公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发本邮件中的信息。如果您错收了本邮件请您 >> 即电话或邮件通知发件人并删除本邮件! This e-mail and its attachments contain confidential information from Uniview, >> which is intended only for the person or entity whose address is listed above. Any use of the >> information contained herein in any way (including, but not limited to, total or partial >> disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is >> prohibited. If you receive this e-mail in error, please notify the sender by phone or email >> immediately and delete it! > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com