Re: mds suicide

Leander Yu <leander.yu@xxxxxxxxx> · Wed, 6 Oct 2010 10:25:10 +0800

I left it there for further trouble shooting, however, I guess one of
my colleague restart the cosd so I no longer have the stack trace.
Let me see if I can reproduce the issue again.

Regards,
Leander Yu.

On Tue, Oct 5, 2010 at 11:44 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> On Tue, 5 Oct 2010, Leander Yu wrote:
>> use gdb to attach to the cosd process, most of the threads are in
>> pthred_mutext_lock , however there are two thread keep waiting for
>> pg->lock()
>
> Yeah.  Either a thread holding the lock is deadlocked somewhere, or the
> lock was leaked.  Do you still have the stack trace from the other
> threads?
>
> sage
>
>>
>> ------------ Thread 184 -----------------
>> #0  0x00007f7b801ecc44 in __lll_lock_wait () from /lib64/libpthread.so.0
>> #1  0x00007f7b801e7f15 in _L_lock_1056 () from /lib64/libpthread.so.0
>> #2  0x00007f7b801e7de7 in pthread_mutex_lock () from /lib64/libpthread.so.0
>> #3  0x000000000047509a in ?? ()
>> #4  0x00000000004b4181 in C_OSD_Commit::finish(int) ()
>> #5  0x00000000005b57d8 in Finisher::finisher_thread_entry() ()
>> #6  0x000000000046c61a in Thread::_entry_func(void*) ()
>> #7  0x00007f7b801e5a3a in start_thread () from /lib64/libpthread.so.0
>> #8  0x00007f7b7f40377d in clone () from /lib64/libc.so.6
>> #9  0x0000000000000000 in ?? ()
>>
>> -------------- Thread 182 -------------------
>> #0  0x00007f7b801ecc44 in __lll_lock_wait () from /lib64/libpthread.so.0
>> #1  0x00007f7b801e7f15 in _L_lock_1056 () from /lib64/libpthread.so.0
>> #2  0x00007f7b801e7de7 in pthread_mutex_lock () from /lib64/libpthread.so.0
>> #3  0x00000000004eb0fa in Mutex::Lock(bool) ()
>> #4  0x00000000004bac90 in OSD::_lookup_lock_pg(pg_t) ()
>> #5  0x00000000004e6e7f in OSD::handle_sub_op_reply(MOSDSubOpReply*) ()
>> #6  0x00000000004e871d in OSD::_dispatch(Message*) ()
>> #7  0x00000000004e8fa9 in OSD::ms_dispatch(Message*) ()
>> #8  0x000000000045eab9 in SimpleMessenger::dispatch_entry() ()
>> #9  0x00000000004589fc in SimpleMessenger::DispatchThread::entry() ()
>> #10 0x000000000046c61a in Thread::_entry_func(void*) ()
>> #11 0x00007f7b801e5a3a in start_thread () from /lib64/libpthread.so.0
>> #12 0x00007f7b7f40377d in clone () from /lib64/libc.so.6
>> #13 0x0000000000000000 in ?? ()
>>
>> It seems the dispatch thread was blocked so no message can be handle?
>>
>> Regards,
>> Leander Yu.
>>
>>
>> On Tue, Oct 5, 2010 at 2:04 PM, Leander Yu <leander.yu@xxxxxxxxx> wrote:
>> > it seem like the cosd(asgc-osd9) is idle now. it didn't send any
>> > hearbeat out. but the process is still running.
>> > This is the netstat output
>> > ---------------------------------------------
>> > Proto Recv-Q Send-Q Local Address               Foreign Address
>> >     State
>> > tcp        0      0 0.0.0.0:8649                0.0.0.0:*
>> >     LISTEN
>> > tcp        0      0 0.0.0.0:8139                0.0.0.0:*
>> >     LISTEN
>> > tcp        0      0 0.0.0.0:587                 0.0.0.0:*
>> >     LISTEN
>> > tcp        0      0 0.0.0.0:8651                0.0.0.0:*
>> >     LISTEN
>> > tcp        0      0 0.0.0.0:8652                0.0.0.0:*
>> >     LISTEN
>> > tcp        0      0 192.168.1.9:50000           0.0.0.0:*
>> >     LISTEN
>> > tcp        0      0 0.0.0.0:6800                0.0.0.0:*
>> >     LISTEN
>> > tcp        0      0 192.168.1.9:50001           0.0.0.0:*
>> >     LISTEN
>> > tcp        0      0 0.0.0.0:465                 0.0.0.0:*
>> >     LISTEN
>> > tcp        0      0 0.0.0.0:6801                0.0.0.0:*
>> >     LISTEN
>> > tcp        0      0 0.0.0.0:22                  0.0.0.0:*
>> >     LISTEN
>> > tcp        0      0 192.168.1.9:50008           0.0.0.0:*
>> >     LISTEN
>> > tcp        0      0 0.0.0.0:25                  0.0.0.0:*
>> >     LISTEN
>> > tcp        0      0 192.168.1.9:50011           0.0.0.0:*
>> >     LISTEN
>> > tcp        0      0 127.0.0.1:8649              127.0.0.1:38054
>> >     TIME_WAIT
>> > tcp        0      0 127.0.0.1:8649              127.0.0.1:38055
>> >     TIME_WAIT
>> > tcp        0      0 192.168.1.9:22              192.168.1.101:35785
>> >     ESTABLISHED
>> > tcp        0      0 127.0.0.1:8649              127.0.0.1:38053
>> >     TIME_WAIT
>> > tcp        0      0 192.168.1.9:22              192.168.1.101:56473
>> >     ESTABLISHED
>> > tcp        0      0 127.0.0.1:8649              127.0.0.1:38052
>> >     TIME_WAIT
>> > tcp        0      0 :::587                      :::*
>> >     LISTEN
>> > tcp        0      0 :::465                      :::*
>> >     LISTEN
>> > tcp        0      0 :::22                       :::*
>> >     LISTEN
>> > tcp        0      0 :::25                       :::*
>> >     LISTEN
>> >
>> > Regards,
>> > Leander Yu.
>> >
>> >
>> >
>> > On Tue, Oct 5, 2010 at 1:40 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
>> >> On Tue, 5 Oct 2010, Leander Yu wrote:
>> >>> I have another OSD was marked as down however I can still access the
>> >>> machine by ssh and I saw the cosd process is running.
>> >>> the log shows the same pipe fault error like:
>> >>>
>> >>> 192.168.1.9:6801/1537 >> 192.168.1.25:6801/29084 pipe(0x7f7b680e2620
>> >>> sd=-1 pgs=437 cs=1 l=0).fault with nothing to send, going to standby
>> >>
>> >> That error means there was a socket error (usually connection dropped,
>> >> but it could lots of things), but the connection wasn't in use.
>> >>
>> >> This one looks like the heartbeat channel.  Most likely that connection
>> >> reconnected shortly after that (the osds send heartbeats every couple
>> >> seconds).  They're marked down when peer osds expected a heartbeat and
>> >> don't get one.  The monitor log ($mon_data/log) normally has information
>> >> about who reported the failure, but it looks like you've turned it off.
>> >>
>> >> In any case, usually the error is harmless.  And probably unrelated to the
>> >> MDS error (unless perhaps the same network glitch was to blame).
>> >>
>> >> sage
>> >>
>> >>
>> >>
>> >>>
>> >>> are those two cases related?
>> >>>
>> >>> Regards,
>> >>> Leander Yu.
>> >>>
>> >>>
>> >>> On Tue, Oct 5, 2010 at 1:15 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
>> >>> > On Tue, 5 Oct 2010, Leander Yu wrote:
>> >>> >> Hi Sage,
>> >>> >> Thanks a lot for your prompt answer.
>> >>> >> So is the behavior normal? I mean if we assume there was a network issue.
>> >>> >> In this case will it be better to restart the mds instead of suicide?
>> >>> >> or leave it there as standby?
>> >>> >
>> >>> > The mds has lots of internal state that would be tricky to clean up
>> >>> > properly, so one way or another the old instance should die.
>> >>> >
>> >>> > But you're right: probably it should just respawn a new instance instead
>> >>> > of exiting?  The new instance will come back up in standby mode. Maybe
>> >>> > re-exec with the same set of arguments the original instance was exectued
>> >>> > with?
>> >>> >
>> >>> > sage
>> >>> >
>> >>> >
>> >>> >>
>> >>> >> Regards,
>> >>> >> Leander Yu.
>> >>> >>
>> >>> >> On Tue, Oct 5, 2010 at 1:00 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
>> >>> >> > On Mon, 4 Oct 2010, Sage Weil wrote:
>> >>> >> >> On Tue, 5 Oct 2010, Leander Yu wrote:
>> >>> >> >> > Hi,
>> >>> >> >> > I have a 46 machines cluster(44 osd/mon + 2 mds) running ceph now. MDS
>> >>> >> >> > is running in active/standby mode.
>> >>> >> >> > This morning one of the MDS suicide the log shows:
>> >>> >> >> >
>> >>> >> >> > -------------------------------------------
>> >>> >> >> > 2010-10-04 22:24:19.450022 7f2e5a1ee710 mds0.cache.ino(10000002b87)
>> >>> >> >> > pop_projected_snaprealm 0x7f2e50cd9f70 seq1
>> >>> >> >> > 2010-10-04 22:26:12.180854 7f2debbfb710 -- 192.168.1.103:6800/2081 >>
>> >>> >> >> > 192.168.1.106:0/2453428678 pipe(0x7f2e380013d0 sd=-1 pgs=2 cs=1
>> >>> >> >> > l=0).fault with nothing to send, going to standby
>> >>> >> >> > 2010-10-04 22:26:12.181019 7f2e481dc710 -- 192.168.1.103:6800/2081 >>
>> >>> >> >> > 192.168.1.111:0/18905730 pipe(0x7f2e38002250 sd=-1 pgs=2 cs=1
>> >>> >> >> > l=0).fault with nothing to send, going to standby
>> >>> >> >> > 2010-10-04 22:26:12.181041 7f2dc3fff710 -- 192.168.1.103:6800/2081 >>
>> >>> >> >> > 192.168.1.114:0/1945631186 pipe(0x7f2e38000f00 sd=-1 pgs=2 cs=1
>> >>> >> >> > l=0).fault with nothing to send, going to standby
>> >>> >> >> > 2010-10-04 22:26:12.181149 7f2deaef6710 -- 192.168.1.103:6800/2081 >>
>> >>> >> >> > 192.168.1.113:0/521184914 pipe(0x7f2e38002f90 sd=-1 pgs=2 cs=1
>> >>> >> >> > l=0).fault with nothing to send, going to standby
>> >>> >> >> > 2010-10-04 22:26:12.181563 7f2deb5f5710 -- 192.168.1.103:6800/2081 >>
>> >>> >> >> > 192.168.1.112:0/4272114728 pipe(0x7f2e38002ac0 sd=-1 pgs=2 cs=1
>> >>> >> >> > l=0).fault with nothing to send, going to standby
>> >>> >> >> > 2010-10-04 22:26:13.777624 7f2e5a1ee710 mds-1.3 handle_mds_map i
>> >>> >> >> > (192.168.1.103:6800/2081) dne in the mdsmap, killing myself
>> >>> >> >> > 2010-10-04 22:26:13.777649 7f2e5a1ee710 mds-1.3 suicide.  wanted
>> >>> >> >> > up:active, now down:dne
>> >>> >> >> > 2010-10-04 22:26:13.777769 7f2e489e4710 -- 192.168.1.103:6800/2081 >>
>> >>> >> >> > 192.168.1.101:0/15702 pipe(0x7f2e380008c0 sd=-1 pgs=1847 cs=1
>> >>> >> >> > l=0).fault with nothing to send, going to standby
>> >>> >> >> > ------------------------------------------------------------------------------
>> >>> >> >> > Would you suggest how do I trouble shooting this issue? or should I
>> >>> >> >> > just restart the mds to recover it?
>> >>> >> >>
>> >>> >> >> The MDS killed itself because it was removed from the mdsmap.  The
>> >>> >> >> monitor log will tell you why if you had logging turned up.  If not, you
>> >>> >> >> might find some clue by looking at each mdsmap iteration.  If you do
>> >>> >> >>
>> >>> >> >>  $ ceph mds stat
>> >>> >> >>
>> >>> >> >> it will tell you the map epoch (e###).  You can then dump any map
>> >>> >> >> iteration with
>> >>> >> >>
>> >>> >> >>  $ ceph mds dump 123 -o -
>> >>> >> >>
>> >>> >> >> Work backward a few iterations until you find which epoch removed that mds
>> >>> >> >> instance.  The one prior to that might have some clue (maybe it was
>> >>> >> >> laggy?)...
>> >>> >> >
>> >>> >> > Okay, looking at the maps on your cluster, it looks like there was a
>> >>> >> > standby mds, and the live one was marked down.  Probably some intermittent
>> >>> >> > network issue preventing it from sending the monitor beacon on time, and
>> >>> >> > the monitor decided it was dead/unresponsive.  The standby cmds took over
>> >>> >> > successfully.  The recovery looks like it took about 20 seconds.
>> >>> >> >
>> >>> >> > sage
>> >>> >> >
>> >>> >> --
>> >>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >>> >> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> >>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >>> >>
>> >>> >>
>> >>> --
>> >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >>>
>> >>>
>> >
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html