Re: mds continuously crashing on Firefly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

Just providing an update to this -- I started the mds daemon on a new server and rebooted a box with a hung CephFS mount (from the first crash) and the problem seems to have gone away. 

I'm still not sure why the mds was shutting down with a "Caught signal", though. 

Cheers,
Lincoln

On Nov 13, 2014, at 11:01 AM, Lincoln Bryant wrote:

> Hi Cephers,
> 
> Over night, our MDS crashed, failing over to the standby which also crashed! Upon trying to restart them this morning, I find that they no longer start and always seem to crash on the same file in the logs. I've pasted part of a "ceph mds tell 0 injectargs '--debug-mds 20 --debug-ms 1'" below [1].
> 
> Can anyone help me interpret this error? 
> 
> Thanks for your time,
> Lincoln Bryant
> 
> [1]
>    -7> 2014-11-13 10:52:15.064784 7fc49d8ab700  7 mds.0.locker rdlock_start  on (ifile sync->mix) on [inode 1000258c3c8 [2,head] /stash/sys/etc/grid-mapfile auth v754009 ap=27+0 s=17384 n(v0 b17384 1=1+0) (ifile sync->mix) (iversion lock) cr={374559=0-4194304@1} caps={374511=pAsLsXsFr/pAsLsXsFscr/pFscr@5,374559=pAsLsXsFr/pAsxXsxFxwb@5} | ptrwaiter=0 request=26 lock=1 caps=1 dirty=1 waiter=1 authpin=1 0x5438900]
>    -6> 2014-11-13 10:52:15.064794 7fc49d8ab700  7 mds.0.locker rdlock_start waiting on (ifile sync->mix) on [inode 1000258c3c8 [2,head] /stash/sys/etc/grid-mapfile auth v754009 ap=27+0 s=17384 n(v0 b17384 1=1+0) (ifile sync->mix) (iversion lock) cr={374559=0-4194304@1} caps={374511=pAsLsXsFr/pAsLsXsFscr/pFscr@5,374559=pAsLsXsFr/pAsxXsxFxwb@5} | ptrwaiter=0 request=26 lock=1 caps=1 dirty=1 waiter=1 authpin=1 0x5438900]
>    -5> 2014-11-13 10:52:15.064805 7fc49d8ab700 10 mds.0.cache.ino(1000258c3c8) add_waiter tag 40000000 0xbf71920 !ambig 1 !frozen 1 !freezing 1
>    -4> 2014-11-13 10:52:15.064808 7fc49d8ab700 15 mds.0.cache.ino(1000258c3c8) taking waiter here
>    -3> 2014-11-13 10:52:15.064810 7fc49d8ab700 10 mds.0.locker nudge_log (ifile sync->mix) on [inode 1000258c3c8 [2,head] /stash/sys/etc/grid-mapfile auth v754009 ap=27+0 s=17384 n(v0 b17384 1=1+0) (ifile sync->mix) (iversion lock) cr={374559=0-4194304@1} caps={374511=pAsLsXsFr/pAsLsXsFscr/pFscr@5,374559=pAsLsXsFr/pAsxXsxFxwb@5} | ptrwaiter=0 request=26 lock=1 caps=1 dirty=1 waiter=1 authpin=1 0x5438900]
>    -2> 2014-11-13 10:52:15.064827 7fc49d8ab700  1 -- 192.170.227.116:6800/6489 <== osd.104 192.170.227.122:6812/1084 911 ==== osd_op_reply(82611 100022a4e3a.00000000 [tmapget 0~0] v0'0 uv78780 ondisk = 0) v6 ==== 187+0+1410 (1370366691 0 1858920835) 0x298ffd00 con 0x5b606e0
>    -1> 2014-11-13 10:52:15.064843 7fc49d8ab700 10 mds.0.cache.dir(100022a4e3a) _tmap_fetched 1410 bytes for [dir 100022a4e3a /stash/user/daveminh/data/DUD/ampc/AlGDock/dock/DUDE.decoy.CHB-1l2sA.0-0/ [2,head] auth v=0 cv=0/0 ap=1+0+0 state=1073741952 f() n() hs=0+0,ss=0+0 | waiter=1 authpin=1 0x3b0a040] want_dn=
>     0> 2014-11-13 10:52:15.066789 7fc49d8ab700 -1 *** Caught signal (Aborted) **
> in thread 7fc49d8ab700
> 
> ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
> 1: /usr/bin/ceph-mds() [0x82f741]
> 2: /lib64/libpthread.so.0() [0x371c40f710]
> 3: (gsignal()+0x35) [0x371bc32635]
> 4: (abort()+0x175) [0x371bc33e15]
> 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x371e0bea5d]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux