This is a trace of an MDS crash. I was running a simple setup (./vstart -d -n), and this is from out/mds.b This is from the latest wip-getdir branch. I posted some context preceding the crash. I have the full trace if more context is helpful. -Noah ================================ 2011-10-28 15:50:00.251876 7f2f3102b700 mds.1.cache.dir(100000003f6) pop_and_dirty_projected_fnode 0x13ab180 v55 2011-10-28 15:50:00.251902 7f2f3102b700 mds.1.cache.dir(100000003f6) mark_dirty (already dirty) [dir 100000003f6 /tmp/hadoop-nwatkins/mapred/staging/nwatkins/.staging/ [2,head] auth{0=1} pv=55 v=55 cv=0/0 ap=1+2+2 state=1610612738|complete f(v0 m2011-10-28 15:50:00.116185 3=0+3)->f(v0 m2011-10-28 15:50:00.116185 3=0+3) n(v5 rc2011-10-28 15:50:00.116185 b284930 5=2+3)->n(v5 rc2011-10-28 15:50:00.116185 b284930 5=2+3) hs=3+1,ss=0+0 dirty=4 | child replicated dirty authpin 0x12b6770] version 55 2011-10-28 15:50:00.251909 7f2f3102b700 mds.1.cache.dir(100000003f5) pop_and_dirty_projected_fnode 0x13abb40 v52 2011-10-28 15:50:00.251936 7f2f3102b700 mds.1.cache.dir(100000003f5) mark_dirty (already dirty) [dir 100000003f5 /tmp/hadoop-nwatkins/mapred/staging/nwatkins/ [2,head] auth{0=1} pv=52 v=52 cv=0/0 ap=1+1+2 state=1610612738|complete f(v0 m2011-10-28 15:39:07.835948 1=0+1)->f(v0 m2011-10-28 15:39:07.835948 1=0+1) n(v9 rc2011-10-28 15:50:00.116185 b284930 6=2+4)/n(v9 rc2011-10-28 15:46:30.070103 b284930 5=2+3)->n(v9 rc2011-10-28 15:50:00.116185 b284930 6=2+4)/n(v9 rc2011-10-28 15:46:30.070103 b284930 5=2+3) hs=1+0,ss=0+0 dirty=1 | child replicated dirty authpin 0x12b6378] version 52 2011-10-28 15:50:00.251957 7f2f3102b700 mds.1.cache send_dentry_link [dentry #1/tmp/hadoop-nwatkins/mapred/staging/nwatkins/.staging/job_201110281545_0003 [2,head] auth (dn xlock x=1 by 0x135bc00) (dversion lock w=1 last_client=4242) v=54 ap=2+0 inode=0x1311b60 | request lock inodepin dirty authpin 0x1345d80] 2011-10-28 15:50:00.251980 7f2f3102b700 mds.1.server reply_request 0 (Success) client_request(client.4242:11 mkdir #100000003f6/job_201110281545_0003) v1 2011-10-28 15:50:00.251990 7f2f3102b700 mds.1.server apply_allocated_inos 20000000004 / [20000000005~3e8] / 0 2011-10-28 15:50:00.252002 7f2f3102b700 mds.1.inotable: apply_alloc_id 20000000004 to [200000003ed~2fffffffc12]/[200000003ec~2fffffffc13] ./include/interval_set.h: In function 'void interval_set<T>::erase(T, T) [with T = inodeno_t]', in thread '7f2f3102b700' ./include/interval_set.h: 385: FAILED assert(p->first <= start) ceph version 0.37-192-g1a4eec2 (commit:1a4eec20a345ced993a48012aaaa8d8ca344a1ba) 1: (InoTable::apply_alloc_id(inodeno_t)+0x441) [0x647041] 2: (Server::apply_allocated_inos(MDRequest*)+0x4dd) [0x509f3d] 3: (Server::reply_request(MDRequest*, MClientReply*, CInode*, CDentry*)+0x83) [0x50a283] 4: (C_MDS_mknod_finish::finish(int)+0xfe) [0x53686e] 5: (Context::complete(int)+0xa) [0x4a4d7a] 6: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0xc8) [0x4c3568] 7: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x18f) [0x69dd9f] 8: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xc57) [0x686c47] 9: (MDS::handle_core_message(Message*)+0x987) [0x4bedf7] 10: (MDS::_dispatch(Message*)+0x2f) [0x4bef8f] 11: (MDS::ms_dispatch(Message*)+0x70) [0x4c06f0] 12: (SimpleMessenger::dispatch_entry()+0x833) [0x6edd13] 13: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x49ed7c] 14: (()+0x7efc) [0x7f2f348f0efc] 15: (clone()+0x6d) [0x7f2f3332a89d] ceph version 0.37-192-g1a4eec2 (commit:1a4eec20a345ced993a48012aaaa8d8ca344a1ba) 1: (InoTable::apply_alloc_id(inodeno_t)+0x441) [0x647041] 2: (Server::apply_allocated_inos(MDRequest*)+0x4dd) [0x509f3d] 3: (Server::reply_request(MDRequest*, MClientReply*, CInode*, CDentry*)+0x83) [0x50a283] 4: (C_MDS_mknod_finish::finish(int)+0xfe) [0x53686e] 5: (Context::complete(int)+0xa) [0x4a4d7a] 6: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0xc8) [0x4c3568] 7: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x18f) [0x69dd9f] 8: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xc57) [0x686c47] 9: (MDS::handle_core_message(Message*)+0x987) [0x4bedf7] 10: (MDS::_dispatch(Message*)+0x2f) [0x4bef8f] 11: (MDS::ms_dispatch(Message*)+0x70) [0x4c06f0] 12: (SimpleMessenger::dispatch_entry()+0x833) [0x6edd13] 13: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x49ed7c] 14: (()+0x7efc) [0x7f2f348f0efc] 15: (clone()+0x6d) [0x7f2f3332a89d] *** Caught signal (Aborted) ** in thread 7f2f3102b700 ceph version 0.37-192-g1a4eec2 (commit:1a4eec20a345ced993a48012aaaa8d8ca344a1ba) 1: ./ceph-mds() [0x777fb6] 2: (()+0x10060) [0x7f2f348f9060] 3: (gsignal()+0x35) [0x7f2f3327f3a5] 4: (abort()+0x17b) [0x7f2f33282b0b] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f2f33b3dd7d] 6: (()+0xb9f26) [0x7f2f33b3bf26] 7: (()+0xb9f53) [0x7f2f33b3bf53] 8: (()+0xba04e) [0x7f2f33b3c04e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x193) [0x6fedf3] 10: (InoTable::apply_alloc_id(inodeno_t)+0x441) [0x647041] 11: (Server::apply_allocated_inos(MDRequest*)+0x4dd) [0x509f3d] 12: (Server::reply_request(MDRequest*, MClientReply*, CInode*, CDentry*)+0x83) [0x50a283] 13: (C_MDS_mknod_finish::finish(int)+0xfe) [0x53686e] 14: (Context::complete(int)+0xa) [0x4a4d7a] 15: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0xc8) [0x4c3568] 16: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x18f) [0x69dd9f] 17: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xc57) [0x686c47] 18: (MDS::handle_core_message(Message*)+0x987) [0x4bedf7] 19: (MDS::_dispatch(Message*)+0x2f) [0x4bef8f] 20: (MDS::ms_dispatch(Message*)+0x70) [0x4c06f0] 21: (SimpleMessenger::dispatch_entry()+0x833) [0x6edd13] 22: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x49ed7c] 23: (()+0x7efc) [0x7f2f348f0efc] 24: (clone()+0x6d) [0x7f2f3332a89d] -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html