Hi Cephers, Over night, our MDS crashed, failing over to the standby which also crashed! Upon trying to restart them this morning, I find that they no longer start and always seem to crash on the same file in the logs. I've pasted part of a "ceph mds tell 0 injectargs '--debug-mds 20 --debug-ms 1'" below [1]. Can anyone help me interpret this error? Thanks for your time, Lincoln Bryant [1] -7> 2014-11-13 10:52:15.064784 7fc49d8ab700 7 mds.0.locker rdlock_start on (ifile sync->mix) on [inode 1000258c3c8 [2,head] /stash/sys/etc/grid-mapfile auth v754009 ap=27+0 s=17384 n(v0 b17384 1=1+0) (ifile sync->mix) (iversion lock) cr={374559=0-4194304@1} caps={374511=pAsLsXsFr/pAsLsXsFscr/pFscr@5,374559=pAsLsXsFr/pAsxXsxFxwb@5} | ptrwaiter=0 request=26 lock=1 caps=1 dirty=1 waiter=1 authpin=1 0x5438900] -6> 2014-11-13 10:52:15.064794 7fc49d8ab700 7 mds.0.locker rdlock_start waiting on (ifile sync->mix) on [inode 1000258c3c8 [2,head] /stash/sys/etc/grid-mapfile auth v754009 ap=27+0 s=17384 n(v0 b17384 1=1+0) (ifile sync->mix) (iversion lock) cr={374559=0-4194304@1} caps={374511=pAsLsXsFr/pAsLsXsFscr/pFscr@5,374559=pAsLsXsFr/pAsxXsxFxwb@5} | ptrwaiter=0 request=26 lock=1 caps=1 dirty=1 waiter=1 authpin=1 0x5438900] -5> 2014-11-13 10:52:15.064805 7fc49d8ab700 10 mds.0.cache.ino(1000258c3c8) add_waiter tag 40000000 0xbf71920 !ambig 1 !frozen 1 !freezing 1 -4> 2014-11-13 10:52:15.064808 7fc49d8ab700 15 mds.0.cache.ino(1000258c3c8) taking waiter here -3> 2014-11-13 10:52:15.064810 7fc49d8ab700 10 mds.0.locker nudge_log (ifile sync->mix) on [inode 1000258c3c8 [2,head] /stash/sys/etc/grid-mapfile auth v754009 ap=27+0 s=17384 n(v0 b17384 1=1+0) (ifile sync->mix) (iversion lock) cr={374559=0-4194304@1} caps={374511=pAsLsXsFr/pAsLsXsFscr/pFscr@5,374559=pAsLsXsFr/pAsxXsxFxwb@5} | ptrwaiter=0 request=26 lock=1 caps=1 dirty=1 waiter=1 authpin=1 0x5438900] -2> 2014-11-13 10:52:15.064827 7fc49d8ab700 1 -- 192.170.227.116:6800/6489 <== osd.104 192.170.227.122:6812/1084 911 ==== osd_op_reply(82611 100022a4e3a.00000000 [tmapget 0~0] v0'0 uv78780 ondisk = 0) v6 ==== 187+0+1410 (1370366691 0 1858920835) 0x298ffd00 con 0x5b606e0 -1> 2014-11-13 10:52:15.064843 7fc49d8ab700 10 mds.0.cache.dir(100022a4e3a) _tmap_fetched 1410 bytes for [dir 100022a4e3a /stash/user/daveminh/data/DUD/ampc/AlGDock/dock/DUDE.decoy.CHB-1l2sA.0-0/ [2,head] auth v=0 cv=0/0 ap=1+0+0 state=1073741952 f() n() hs=0+0,ss=0+0 | waiter=1 authpin=1 0x3b0a040] want_dn= 0> 2014-11-13 10:52:15.066789 7fc49d8ab700 -1 *** Caught signal (Aborted) ** in thread 7fc49d8ab700 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) 1: /usr/bin/ceph-mds() [0x82f741] 2: /lib64/libpthread.so.0() [0x371c40f710] 3: (gsignal()+0x35) [0x371bc32635] 4: (abort()+0x175) [0x371bc33e15] 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x371e0bea5d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com