Re: [lists.ceph.com代发]Re: MDS Crashing 14.2.1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I've had similar problem twice (with mimic) and in both cases I ended up backing up and restoring to a fresh fs. Did you do MDS scrub after recovery? My experience insists that recovering dup inodes is not a trivial process: my MDS kept crashing on unlink() in some directories, and in other case newly created fs entries would not pass MDS scrub due to linkage errors.


May 17, 2019 3:40 PM, "Adam Tygart" <mozes@xxxxxxx> wrote:

> I followed the docs from here:
> http://docs.ceph.com/docs/nautilus/cephfs/disaster-recovery-experts/#disaster-recovery-experts
> 
> I exported the journals as a backup for both ranks. I was running 2
> active MDS daemons at the time.
> 
> cephfs-journal-tool --rank=combined:0 journal export
> cephfs-journal-0-201905161412.bin
> cephfs-journal-tool --rank=combined:1 journal export
> cephfs-journal-1-201905161412.bin
> 
> I recovered the Dentries on both ranks
> cephfs-journal-tool --rank=combined:0 event recover_dentries summary
> cephfs-journal-tool --rank=combined:1 event recover_dentries summary
> 
> I reset the journals of both ranks:
> cephfs-journal-tool --rank=combined:1 journal reset
> cephfs-journal-tool --rank=combined:0 journal reset
> 
> Then I reset the session table
> cephfs-table-tool all reset session
> 
> Once that was done, reboot all machines that were talking to cephfs
> (or at least unmount/remount).
> 
> On Fri, May 17, 2019 at 2:30 AM <wangzhigang@xxxxxxxxxxx> wrote:
> 
>> Hi
>> Can you tell me the detail recovery cmd ?
>> 
>> I just started learning cephfs ,I would be grateful.
>> 
>> 发件人: Adam Tygart <mozes@xxxxxxx>
>> 收件人: Ceph Users <ceph-users@xxxxxxxxxxxxxx>
>> 日期: 2019/05/17 09:04
>> 主题: [lists.ceph.com代发]Re:  MDS Crashing 14.2.1
>> 发件人: "ceph-users" <ceph-users-bounces@xxxxxxxxxxxxxx>
>> ________________________________
>> 
>> I ended up backing up the journals of the MDS ranks, recover_dentries for both of them, resetting
>> the journals and session table. It is back up. The recover dentries stage didn't show any errors,
>> so I'm not even sure why the MDS was asserting about duplicate inodes.
>> 
>> --
>> Adam
>> 
>> On Thu, May 16, 2019, 13:52 Adam Tygart <mozes@xxxxxxx> wrote:
>> Hello all,
>> 
>> The rank 0 mds is still asserting. Is this duplicate inode situation
>> one that I should be considering using the cephfs-journal-tool to
>> export, recover dentries and reset?
>> 
>> Thanks,
>> Adam
>> 
>> On Thu, May 16, 2019 at 12:51 AM Adam Tygart <mozes@xxxxxxx> wrote:
>> 
>> Hello all,
>> 
>> I've got a 30 node cluster serving up lots of CephFS data.
>> 
>> We upgraded to Nautilus 14.2.1 from Luminous 12.2.11 on Monday earlier
>> this week.
>> 
>> We've been running 2 MDS daemons in an active-active setup. Tonight
>> one of the metadata daemons crashed with the following several times:
>> 
>> -1> 2019-05-16 00:20:56.775 7f9f22405700 -1
>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/cent
>> s7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.1/rpm/el7/BUILD/ceph-14.2.1/src/mds/CInode.h:
>> In function 'void CIn
>> ode::set_primary_parent(CDentry*)' thread 7f9f22405700 time 2019-05-16
>> 00:20:56.775021
>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/cent
>> s7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.1/rpm/el7/BUILD/ceph-14.2.1/src/mds/CInode.h:
>> 1114: FAILED ceph_assert(parent == 0 || g_conf().get_val<bool>("mds_h
>> ack_allow_loading_invalid_metadata"))
>> 
>> I made a quick decision to move to a single MDS because I saw
>> set_primary_parent, and I thought it might be related to auto
>> balancing between the metadata servers.
>> 
>> This caused one MDS to fail, the other crashed, and now rank 0 loads,
>> goes active and then crashes with the following:
>> -1> 2019-05-16 00:29:21.151 7fe315e8d700 -1
>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/cent
>> s7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.1/rpm/el7/BUILD/ceph-14.2.1/src/mds/MDCache.cc:
>> In function 'void M
>> DCache::add_inode(CInode*)' thread 7fe315e8d700 time 2019-05-16 00:29:21.149531
>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/cent
>> s7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.1/rpm/el7/BUILD/ceph-14.2.1/src/mds/MDCache.cc:
>> 258: FAILED ceph_assert(!p)
>> 
>> It now looks like we somehow have a duplicate inode in the MDS journal?
>> 
>> https://people.cs.ksu.edu/~mozes/ceph-mds.melinoe.log <- was rank 0
>> then became rank one after the crash and attempted drop to one active
>> MDS
>> https://people.cs.ksu.edu/~mozes/ceph-mds.mormo.log <- current rank 0
>> and crashed
>> 
>> Anyone have any thoughts on this?
>> 
>> Thanks,
>> Adam
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com______________________________________________
>> 
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>> ----------------------------------------------------------------------------------------------------
>> ----------------------------------------------------------------------------
>> 本邮件及其附件含有浙江宇视科技有限公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发本邮件中的信息。如果您错收了本邮件请您
>> 即电话或邮件通知发件人并删除本邮件! This e-mail and its attachments contain confidential information from Uniview,
>> which is intended only for the person or entity whose address is listed above. Any use of the
>> information contained herein in any way (including, but not limited to, total or partial
>> disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is
>> prohibited. If you receive this e-mail in error, please notify the sender by phone or email
>> immediately and delete it!
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux