Dear Zheng Yan, I will send you if errors occur. I use 3 mds with 1 active & 2 stanby. How to backup and restore metadata? On Wed, Aug 27, 2014 at 3:09 PM, Yan, Zheng <ukernel at gmail.com> wrote: > Please first delete the old mds log, then run mds with "debug_mds = 15". > Send the whole mds log to us after the mds crashes. > > Yan, Zheng > > > On Wed, Aug 27, 2014 at 12:12 PM, MinhTien MinhTien < > tientienminh080590 at gmail.com> wrote: > >> Hi Gregory Farmum, >> >> Thank you for your reply! >> This is the log: >> >> 2014-08-26 16:22:39.103461 7f083752f700 -1 mds/CDir.cc: In function 'void >> CDir::_committed(version_t)' thread 7f083752f700 time 2014-08-26 >> 16:22:39.075809 >> mds/CDir.cc: 2071: FAILED assert(in->is_dirty() || in->last < >> ((__u64)(-2))) >> >> ceph version 0.67.10 (9d446bd416c52cd785ccf048ca67737ceafcdd7f) >> 1: (CDir::_committed(unsigned long)+0xc4e) [0x74d9ee] >> 2: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe8d) [0x7d09bd] >> 3: (MDS::handle_core_message(Message*)+0x987) [0x57c457] >> 4: (MDS::_dispatch(Message*)+0x2f) [0x57c50f] >> 5: (MDS::ms_dispatch(Message*)+0x19b) [0x57dfbb] >> 6: (DispatchQueue::entry()+0x5a2) [0x904732] >> 7: (DispatchQueue::DispatchThread::entry()+0xd) [0x8afdbd] >> 8: (()+0x79d1) [0x7f083c2979d1] >> 9: (clone()+0x6d) [0x7f083afb6b5d] >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed >> to interpret this. >> >> --- logging levels --- >> 0/ 5 none >> 0/ 1 lockdep >> 0/ 1 context >> 1/ 1 crush >> 1/ 5 mds >> 1/ 5 mds_balancer >> 1/ 5 mds_locker >> 1/ 5 mds_log >> 1/ 5 mds_log_expire >> 1/ 5 mds_migrator >> 0/ 1 buffer >> 0/ 1 timer >> 0/ 1 filer >> 0/ 1 striper >> 0/ 1 objecter >> 0/ 5 rados >> 0/ 5 rbd >> 0/ 5 journaler >> 0/ 5 objectcacher >> 0/ 5 client >> 0/ 5 osd >> 0/ 5 optracker >> 0/ 5 objclass >> 1/ 3 filestore >> 1/ 3 journal >> 0/ 5 ms >> 1/ 5 mon >> 0/10 monc >> 1/ 5 paxos >> 0/ 5 tp >> 1/ 5 auth >> 1/ 5 crypto >> 1/ 1 finisher >> 1/ 5 heartbeatmap >> 1/ 5 perfcounter >> 1/ 5 rgw >> 1/ 5 hadoop >> 1/ 5 javaclient >> 1/ 5 asok >> 1/ 1 throttle >> -2/-2 (syslog threshold) >> -1/-1 (stderr threshold) >> max_recent 10000 >> max_new 1000 >> log_file /var/log/ceph/ceph-mds.Ceph01-dc5k3u0104.log >> --- end dump of recent events --- >> 2014-08-26 16:22:39.134173 7f083752f700 -1 *** Caught signal (Aborted) ** >> in thread 7f083752f700 >> >> >> >> >> On Wed, Aug 27, 2014 at 3:09 AM, Gregory Farnum <greg at inktank.com> wrote: >> >>> I don't think the log messages you're showing are the actual cause of >>> the failure. The log file should have a proper stack trace (with >>> specific function references and probably a listed assert failure), >>> can you find that? >>> -Greg >>> Software Engineer #42 @ http://inktank.com | http://ceph.com >>> >>> >>> On Tue, Aug 26, 2014 at 9:11 AM, MinhTien MinhTien >>> <tientienminh080590 at gmail.com> wrote: >>> > Hi all, >>> > >>> > I have a cluster of 2 nodes on Centos 6.5 with ceph 0.67.10 (replicate >>> = 2) >>> > >>> > When I add the 3rd node in the Ceph Cluster, CEPH perform load >>> balancing. >>> > >>> > I have 3 MDS in 3 nodes,the MDS process is dying after a while with a >>> stack >>> > trace: >>> > >>> > >>> ------------------------------------------------------------------------------------------------------------------------------------------------------- >>> > >>> > 2014-08-26 17:08:34.362901 7f1c2c704700 1 -- 10.20.0.21:6800/22154 >>> <== >>> > osd.10 10.20.0.21:6802/15917 1 ==== osd_op_reply(230 >>> 100000003f6.00000000 >>> > [tmapup 0~0] ondisk = 0) v4 ==== 119+0+0 (1770421071 0 0) 0x2aece00 con >>> > 0x2aa4200 >>> > -54> 2014-08-26 17:08:34.362942 7f1c2c704700 1 -- >>> 10.20.0.21:6800/22154 >>> > <== osd.55 10.20.0.23:6800/2407 10 ==== osd_op_reply(263 >>> > 1000000048a.00000000 [getxattr] ack = -2 (No such file or directory)) >>> v4 >>> > ==== 119+0+0 (3908997833 0 0) 0x1e63000 con 0x1e7aaa0 >>> > -53> 2014-08-26 17:08:34.363001 7f1c2c704700 5 mds.0.log >>> submit_entry >>> > 427629603~1541 : EUpdate purge_stray truncate [metablob 100, 2 dirs] >>> > -52> 2014-08-26 17:08:34.363022 7f1c2c704700 1 -- >>> 10.20.0.21:6800/22154 >>> > <== osd.37 10.20.0.22:6898/11994 6 ==== osd_op_reply(226 1.00000000 >>> [tmapput >>> > 0~7664] ondisk = 0) v4 ==== 109+0+0 (1007110430 0 0) 0x1e64800 con >>> 0x1e7a7e0 >>> > -51> 2014-08-26 17:08:34.363092 7f1c2c704700 5 mds.0.log _expired >>> > segment 293601899 2548 events >>> > -50> 2014-08-26 17:08:34.363117 7f1c2c704700 1 -- >>> 10.20.0.21:6800/22154 >>> > <== osd.17 10.20.0.21:6941/17572 9 ==== osd_op_reply(264 >>> > 10000000489.00000000 [getxattr] ack = -2 (No such file or directory)) >>> v4 >>> > ==== 119+0+0 (1979034473 0 0) 0x1e62200 con 0x1e7b180 >>> > -49> 2014-08-26 17:08:34.363177 7f1c2c704700 5 mds.0.log >>> submit_entry >>> > 427631148~1541 : EUpdate purge_stray truncate [metablob 100, 2 dirs] >>> > -48> 2014-08-26 17:08:34.363197 7f1c2c704700 1 -- >>> 10.20.0.21:6800/22154 >>> > <== osd.1 10.20.0.21:6872/13227 6 ==== osd_op_reply(265 >>> 10000000491.00000000 >>> > [getxattr] ack = -2 (No such file or directory)) v4 ==== 119+0+0 >>> (1231782695 >>> > 0 0) 0x1e63400 con 0x1e7ac00 >>> > -47> 2014-08-26 17:08:34.363255 7f1c2c704700 5 mds.0.log >>> submit_entry >>> > 427632693~1541 : EUpdate purge_stray truncate [metablob 100, 2 dirs] >>> > -46> 2014-08-26 17:08:34.363274 7f1c2c704700 1 -- >>> 10.20.0.21:6800/22154 >>> > <== osd.11 10.20.0.21:6884/7018 5 ==== osd_op_reply(266 >>> 1000000047d.00000000 >>> > [getxattr] ack = -2 (No such file or directory)) v4 ==== 119+0+0 >>> (2737916920 >>> > 0 0) 0x1e61e00 con 0x1e7bc80 >>> > >>> > >>> --------------------------------------------------------------------------------------------------------------------------------------------- >>> > I try to restart MDSs, but after a few seconds in a state of "active", >>> MDS >>> > switch to state "laggy or crashed". I have a lot of important data on >>> it. >>> > I do not want to use the command: >>> > ceph mds newfs <metadata pool id> <data pool id> --yes-i-really-mean-it >>> > >>> > :( >>> > >>> > Tien Bui. >>> > >>> > >>> > >>> > -- >>> > Bui Minh Tien >>> > >>> > _______________________________________________ >>> > ceph-users mailing list >>> > ceph-users at lists.ceph.com >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> > >>> >> >> >> >> -- >> Bui Minh Tien >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users at lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > -- Bui Minh Tien -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140828/f1c0df96/attachment.htm>