As Paul said, the MDS is loading "duplicate inodes" and that's very bad. If you've already gone through some of the disaster recovery steps, that's likely the cause. But you'll need to provide a *lot* more information about what you've already done to the cluster for people to be sure.
The backwards scan referred to is the scan_extents/scan_inodes work described in http://docs.ceph.com/docs/mimic/cephfs/disaster-recovery/#recovery-from-missing-metadata-objects
Be advised that there is limited user experience with *any* of these tools and that you have stumbled into some dark corners. I'm rather surprised that a newish deployment could have needed to make use of any of this repair functionality — if you are deliberately breaking things to see how it recovers, you should probably spend some more time understanding plausible failure cases. This generally only comes up in the case of genuine data loss due to multiple simultaneous hardware failures.
-Greg
On Fri, Aug 10, 2018 at 9:05 AM Amit Handa <amit.handa@xxxxxxxxx> wrote:
Thanks alot, Paul.we did (hopefully) follow through with the disaster recovery.however, please guide me in how to get the cluster back up !Thanks,_______________________________________________On Fri, Aug 10, 2018 at 9:32 PM Paul Emmerich <paul.emmerich@xxxxxxxx> wrote:Looks like you got some duplicate inodes due to corrupted metadata, youlikely tried to a disaster recovery and didn't follow through it completely oryou hit some bug in Ceph.The solution here is probably to do a full recovery of the metadata/fullbackwards scan after resetting the inodes. I've recovered a cluster fromsomething similar just a few weeks ago. Annoying but recoverable.Paul2018-08-10 13:26 GMT+02:00 Amit Handa <amit.handa@xxxxxxxxx>:We are facing constant crash from ceph mds. We have installed mimic (v13.2.1).
mds: cephfs-1/1/1 up {0=node2=up:active(laggy or crashed)}
mds logs: https://pastebin.com/AWGMLRm0
we have followed the DR steps listed at
http://docs.ceph.com/docs/mimic/cephfs/disaster-recovery/
please help in resolving the errors :(
mds crash stacktrace
ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0xff) [0x7f984fc3ee1f]
2: (()+0x284fe7) [0x7f984fc3efe7]
3: (()+0x2087fe) [0x5563e88537fe]
4: (Server::prepare_new_inode(boost::intrusive_ptr<MDRequestImpl>&, CDir*, inodeno_t, unsigned int, file_layout_t*)+0xf37) [0x5563e87ce777]
5: (Server::handle_client_openc(boost::intrusive_ptr<MDRequestImpl>&)+0xdb0) [0x5563e87d0bd0]
6: (Server::handle_client_request(MClientRequest*)+0x49e) [0x5563e87d3c0e]
7: (Server::dispatch(Message*)+0x2db) [0x5563e87d789b]
8: (MDSRank::handle_deferrable_message(Message*)+0x434) [0x5563e87514b4]
9: (MDSRank::_dispatch(Message*, bool)+0x63b) [0x5563e875db5b]
10: (MDSRank::retry_dispatch(Message*)+0x12) [0x5563e875e302]
11: (MDSInternalContextBase::complete(int)+0x67) [0x5563e89afb57]
12: (MDSRank::_advance_queues()+0xd1) [0x5563e875cd51]
13: (MDSRank::ProgressThread::entry()+0x43) [0x5563e875d3e3]
14: (()+0x7e25) [0x7f984d869e25]
15: (clone()+0x6d) [0x7f984c949bad]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--Loading ...
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
--Loading ...
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com