mds continuously crashing on Firefly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Cephers,

Over night, our MDS crashed, failing over to the standby which also crashed! Upon trying to restart them this morning, I find that they no longer start and always seem to crash on the same file in the logs. I've pasted part of a "ceph mds tell 0 injectargs '--debug-mds 20 --debug-ms 1'" below [1].

Can anyone help me interpret this error? 

Thanks for your time,
Lincoln Bryant

[1]
    -7> 2014-11-13 10:52:15.064784 7fc49d8ab700  7 mds.0.locker rdlock_start  on (ifile sync->mix) on [inode 1000258c3c8 [2,head] /stash/sys/etc/grid-mapfile auth v754009 ap=27+0 s=17384 n(v0 b17384 1=1+0) (ifile sync->mix) (iversion lock) cr={374559=0-4194304@1} caps={374511=pAsLsXsFr/pAsLsXsFscr/pFscr@5,374559=pAsLsXsFr/pAsxXsxFxwb@5} | ptrwaiter=0 request=26 lock=1 caps=1 dirty=1 waiter=1 authpin=1 0x5438900]
    -6> 2014-11-13 10:52:15.064794 7fc49d8ab700  7 mds.0.locker rdlock_start waiting on (ifile sync->mix) on [inode 1000258c3c8 [2,head] /stash/sys/etc/grid-mapfile auth v754009 ap=27+0 s=17384 n(v0 b17384 1=1+0) (ifile sync->mix) (iversion lock) cr={374559=0-4194304@1} caps={374511=pAsLsXsFr/pAsLsXsFscr/pFscr@5,374559=pAsLsXsFr/pAsxXsxFxwb@5} | ptrwaiter=0 request=26 lock=1 caps=1 dirty=1 waiter=1 authpin=1 0x5438900]
    -5> 2014-11-13 10:52:15.064805 7fc49d8ab700 10 mds.0.cache.ino(1000258c3c8) add_waiter tag 40000000 0xbf71920 !ambig 1 !frozen 1 !freezing 1
    -4> 2014-11-13 10:52:15.064808 7fc49d8ab700 15 mds.0.cache.ino(1000258c3c8) taking waiter here
    -3> 2014-11-13 10:52:15.064810 7fc49d8ab700 10 mds.0.locker nudge_log (ifile sync->mix) on [inode 1000258c3c8 [2,head] /stash/sys/etc/grid-mapfile auth v754009 ap=27+0 s=17384 n(v0 b17384 1=1+0) (ifile sync->mix) (iversion lock) cr={374559=0-4194304@1} caps={374511=pAsLsXsFr/pAsLsXsFscr/pFscr@5,374559=pAsLsXsFr/pAsxXsxFxwb@5} | ptrwaiter=0 request=26 lock=1 caps=1 dirty=1 waiter=1 authpin=1 0x5438900]
    -2> 2014-11-13 10:52:15.064827 7fc49d8ab700  1 -- 192.170.227.116:6800/6489 <== osd.104 192.170.227.122:6812/1084 911 ==== osd_op_reply(82611 100022a4e3a.00000000 [tmapget 0~0] v0'0 uv78780 ondisk = 0) v6 ==== 187+0+1410 (1370366691 0 1858920835) 0x298ffd00 con 0x5b606e0
    -1> 2014-11-13 10:52:15.064843 7fc49d8ab700 10 mds.0.cache.dir(100022a4e3a) _tmap_fetched 1410 bytes for [dir 100022a4e3a /stash/user/daveminh/data/DUD/ampc/AlGDock/dock/DUDE.decoy.CHB-1l2sA.0-0/ [2,head] auth v=0 cv=0/0 ap=1+0+0 state=1073741952 f() n() hs=0+0,ss=0+0 | waiter=1 authpin=1 0x3b0a040] want_dn=
     0> 2014-11-13 10:52:15.066789 7fc49d8ab700 -1 *** Caught signal (Aborted) **
 in thread 7fc49d8ab700

 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
 1: /usr/bin/ceph-mds() [0x82f741]
 2: /lib64/libpthread.so.0() [0x371c40f710]
 3: (gsignal()+0x35) [0x371bc32635]
 4: (abort()+0x175) [0x371bc33e15]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x371e0bea5d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux