Re: ceph mds crashing constantly : ceph_assert fail … prepare_new_inode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sorry, a step-by-step guide through something like that
is beyond the scope of what we can do on a mailing list.

But what I would do here is carefully asses the situation/
the damage. My wild guess would be to reset and rebuild
the inode table but that might be incorrect and unsafe
without further looking into it.

I don't want to solicit our services here, but we do Ceph
recoveries regularly; reach out to us if you are looking
for a consultant.


Paul


2018-08-10 18:05 GMT+02:00 Amit Handa <amit.handa@xxxxxxxxx>:
Thanks alot, Paul.
we did (hopefully) follow through with the disaster recovery.
however, please guide me in how to get the cluster back up !

Thanks,


On Fri, Aug 10, 2018 at 9:32 PM Paul Emmerich <paul.emmerich@xxxxxxxx> wrote:
Looks like you got some duplicate inodes due to corrupted metadata, you
likely tried to a disaster recovery and didn't follow through it completely or
you hit some bug in Ceph.

The solution here is probably to do a full recovery of the metadata/full
backwards scan after resetting the inodes. I've recovered a cluster from
something similar just a few weeks ago. Annoying but recoverable.

Paul

2018-08-10 13:26 GMT+02:00 Amit Handa <amit.handa@xxxxxxxxx>:
We are facing constant crash from ceph mds. We have installed mimic (v13.2.1).

mds: cephfs-1/1/1 up {0=node2=up:active(laggy or crashed)}

mds logs: https://pastebin.com/AWGMLRm0

we have followed the DR steps listed at

http://docs.ceph.com/docs/mimic/cephfs/disaster-recovery/

please help in resolving the errors :(

mds crash stacktrace

 ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0xff) [0x7f984fc3ee1f]
 2: (()+0x284fe7) [0x7f984fc3efe7]
 3: (()+0x2087fe) [0x5563e88537fe]
 4: (Server::prepare_new_inode(boost::intrusive_ptr<MDRequestImpl>&, CDir*, inodeno_t, unsigned int, file_layout_t*)+0xf37) [0x5563e87ce777]
 5: (Server::handle_client_openc(boost::intrusive_ptr<MDRequestImpl>&)+0xdb0) [0x5563e87d0bd0]
 6: (Server::handle_client_request(MClientRequest*)+0x49e) [0x5563e87d3c0e]
 7: (Server::dispatch(Message*)+0x2db) [0x5563e87d789b]
 8: (MDSRank::handle_deferrable_message(Message*)+0x434) [0x5563e87514b4]
 9: (MDSRank::_dispatch(Message*, bool)+0x63b) [0x5563e875db5b]
 10: (MDSRank::retry_dispatch(Message*)+0x12) [0x5563e875e302]
 11: (MDSInternalContextBase::complete(int)+0x67) [0x5563e89afb57]
 12: (MDSRank::_advance_queues()+0xd1) [0x5563e875cd51]
 13: (MDSRank::ProgressThread::entry()+0x43) [0x5563e875d3e3]
 14: (()+0x7e25) [0x7f984d869e25]
 15: (clone()+0x6d) [0x7f984c949bad]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


--
Loading ...

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


--
Loading ...



--
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux