On Thu, May 23, 2013 at 2:43 PM, Giuseppe 'Gippa' Paterno' <gpaterno@xxxxxxxxxxxx> wrote: > Hi! > > I've got a cluster of two nodes on Ubuntu 12.04 with cuttlefish from the > ceph.com repo. > ceph version 0.61.2 (fea782543a844bb277ae94d3391788b76c5bee60) > > The MDS process is dying after a while with a stack trace, but I can't > understand why. > I reproduced the same problem on debian 7 with the same repository. > > -3> 2013-05-23 23:00:42.957679 7fa39e28e700 1 -- > 10.123.200.189:6800/28919 <== osd.0 10.123.200.188:6802/27665 1 ==== > osd_op_reply(5 200.00000000 [read 0~0] ack = -2 (No such file or > directory)) v4 ==== 111+0+0 (2261481792 0 0) 0x29afe00 con 0x29c4b00 > -2> 2013-05-23 23:00:42.957780 7fa39e28e700 0 mds.0.journaler(ro) > error getting journal off disk > -1> 2013-05-23 23:00:42.960974 7fa39e28e700 1 -- > 10.123.200.189:6800/28919 <== osd.0 10.123.200.188:6802/27665 2 ==== > osd_op_reply(1 mds0_inotable [read 0~0] ack = -2 (No such file or > directory)) v4 ==== 112+0+0 (1612134461 0 0) 0x2a1c200 con 0x29c4b00 > 0> 2013-05-23 23:00:42.963326 7fa39e28e700 -1 mds/MDSTable.cc: In > function 'void MDSTable::load_2(int, ceph::bufferlist&, Context*)' > thread 7fa39e28e700 time 2013-05-23 23:00:42.961076 > mds/MDSTable.cc: 150: FAILED assert(0) > > ceph version 0.61.2 (fea782543a844bb277ae94d3391788b76c5bee60) > 1: (MDSTable::load_2(int, ceph::buffer::list&, Context*)+0x3bb) [0x6dd2db] > 2: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe1b) [0x7275bb] > 3: (MDS::handle_core_message(Message*)+0xae7) [0x513c57] > 4: (MDS::_dispatch(Message*)+0x33) [0x513d53] > 5: (MDS::ms_dispatch(Message*)+0xab) [0x515b3b] > 6: (DispatchQueue::entry()+0x393) [0x847ca3] > 7: (DispatchQueue::DispatchThread::entry()+0xd) [0x7caeed] > 8: (()+0x6b50) [0x7fa3a3376b50] > 9: (clone()+0x6d) [0x7fa3a1d24a7d] > > Full logs here: > http://pastebin.com/C81g5jFd > > I can't understand why and I'd really appreciate an hint. This backtrace indicates that the MDS went to load a RADOS object that doesn't exist. We've seen this popping up occasionally but sadly haven't been able to diagnose the cause (for developers following along at home, I'm wondering if it's related to http://tracker.ceph.com/issues/4894, but that's pure speculation; I haven't checked the write orders at all). Do I correctly assume that you don't have any CephFS data in the cluster yet? If so, I'd just delete your current filesystem and metadata pool, then recreate them. It should all be in the docs. :) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com