Re: MDS Problems

John Spray <jspray@xxxxxxxxxx> · Fri, 4 Nov 2016 15:26:00 +0100

On Fri, Nov 4, 2016 at 2:54 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:
> I upgraded to 10.2.3 today and after restarting the MDS, the same or very similar problem occurred. I didn't see any of the symlink
> errors, so I think that was fixed in the upgrade but I was still seeing looping and crashing until I killed all clients and evicted
> sessions.
>
> Just after restarting MDS
>
> 2016-11-04 13:26:30.373343 7f501f085700  0 mds.0.cache creating system inode with ino:100
> 2016-11-04 13:26:30.374377 7f501f085700  0 mds.0.cache creating system inode with ino:1
> 2016-11-04 13:26:31.290498 7f501d275700  1 mds.0.1370673 replay_done
> 2016-11-04 13:26:31.294583 7f501d275700  1 mds.0.1370673 making mds journal writeable
> 2016-11-04 13:26:31.779444 7f5024392700  1 mds.0.1370673 handle_mds_map i am now mds.0.1370673
> 2016-11-04 13:26:31.779470 7f5024392700  1 mds.0.1370673 handle_mds_map state change up:replay --> up:reconnect
> 2016-11-04 13:26:31.779477 7f5024392700  1 mds.0.1370673 reconnect_start
> 2016-11-04 13:26:31.779479 7f5024392700  1 mds.0.1370673 reopen_log
> 2016-11-04 13:26:31.779485 7f5024392700  1 mds.0.server reconnect_clients -- 11 sessions
> 2016-11-04 13:26:31.785040 7f5024392700  0 log_channel(cluster) log [DBG] : reconnect by client.2623964 10.1.103.231:0/2164520513
> after 0.005521
> 2016-11-04 13:26:31.785218 7f5024392700  0 log_channel(cluster) log [DBG] : reconnect by client.2765641 10.1.103.231:0/1622816431
> after 0.005723
> 2016-11-04 13:26:31.786122 7f5024392700  0 log_channel(cluster) log [DBG] : reconnect by client.2623967 10.1.103.232:0/342252210
> after 0.006627
> 2016-11-04 13:26:31.786713 7f5024392700  0 log_channel(cluster) log [DBG] : reconnect by client.2719695 10.1.103.232:0/1211624383
> after 0.007219
> 2016-11-04 13:26:31.798640 7f5024392700  0 log_channel(cluster) log [DBG] : reconnect by client.2719692 10.1.103.233:0/174968170
> after 0.019143
> 2016-11-04 13:26:31.800473 7f5024392700  0 log_channel(cluster) log [DBG] : reconnect by client.2719698 10.1.103.233:0/2257480425
> after 0.020906
> 2016-11-04 13:26:31.801293 7f5024392700  0 log_channel(cluster) log [DBG] : reconnect by client.2765635 10.1.103.231:0/2313865568
> after 0.021787
> 2016-11-04 13:26:31.886733 7f5024392700  0 log_channel(cluster) log [DBG] : reconnect by client.2719689 10.1.103.232:0/3215599620
> after 0.107140
> 2016-11-04 13:26:31.938048 7f5024392700  0 log_channel(cluster) log [DBG] : reconnect by client.2765770 10.1.106.7:0/789497307 after
> 0.157027
> 2016-11-04 13:26:31.962209 7f5024392700  0 log_channel(cluster) log [DBG] : reconnect by client.2624111 10.1.106.3:0/2929007149
> after 0.182597
> 2016-11-04 13:26:31.984629 7f5024392700  0 log_channel(cluster) log [DBG] : reconnect by client.2765638 10.1.103.233:0/2247183319
> after 0.205018
> 2016-11-04 13:26:32.029738 7f5024392700  1 mds.0.1370673 reconnect_done
> 2016-11-04 13:26:32.785680 7f5024392700  1 mds.0.1370673 handle_mds_map i am now mds.0.1370673
> 2016-11-04 13:26:32.785715 7f5024392700  1 mds.0.1370673 handle_mds_map state change up:reconnect --> up:rejoin
> 2016-11-04 13:26:32.785731 7f5024392700  1 mds.0.1370673 rejoin_start
> 2016-11-04 13:26:38.572270 7f5024392700  1 mds.0.1370673 rejoin_joint_start
> 2016-11-04 13:26:43.971915 7f5020289700  1 mds.0.1370673 rejoin_done
> 2016-11-04 13:26:44.375857 7f5024392700  1 mds.0.1370673 handle_mds_map i am now mds.0.1370673
> 2016-11-04 13:26:44.375871 7f5024392700  1 mds.0.1370673 handle_mds_map state change up:rejoin --> up:active
> 2016-11-04 13:26:44.375878 7f5024392700  1 mds.0.1370673 recovery_done -- successful recovery!
> 2016-11-04 13:26:44.375949 7f5024392700  1 mds.0.1370673 active_start
> 2016-11-04 13:26:44.519621 7f5024392700  1 mds.0.1370673 cluster recovered.

The question here is what's sending the SIGTERM?  It could be the OOM
killer, but given that your system made it to active in just a few
seconds, it seems unlikely that it could have eaten that much RAM that
quickly unless there was something else on the system consuming the
available memory?

> 2016-11-04 13:27:03.651175 7f5020a8a700 -1 mds.gp-ceph-mds1 *** got signal Terminated ***
> 2016-11-04 13:27:03.651191 7f5020a8a700  1 mds.gp-ceph-mds1 suicide.  wanted state up:active
>
> A section whilst its looping
>
> 2016-11-04 13:35:11.255310 7f945df1c700  1 mds.gp-ceph-mds1 handle_mds_map i (10.1.2.76:6800/8619) dne in the mdsmap, respawning
> myself
> 2016-11-04 13:35:11.255330 7f945df1c700  1 mds.gp-ceph-mds1 respawn
> 2016-11-04 13:35:11.255339 7f945df1c700  1 mds.gp-ceph-mds1  e: '/usr/bin/ceph-mds'
> 2016-11-04 13:35:11.255346 7f945df1c700  1 mds.gp-ceph-mds1  0: '/usr/bin/ceph-mds'
> 2016-11-04 13:35:11.255353 7f945df1c700  1 mds.gp-ceph-mds1  1: '-f'
> 2016-11-04 13:35:11.255359 7f945df1c700  1 mds.gp-ceph-mds1  2: '--cluster'
> 2016-11-04 13:35:11.255366 7f945df1c700  1 mds.gp-ceph-mds1  3: 'ceph'
> 2016-11-04 13:35:11.255372 7f945df1c700  1 mds.gp-ceph-mds1  4: '--id'
> 2016-11-04 13:35:11.255381 7f945df1c700  1 mds.gp-ceph-mds1  5: 'gp-ceph-mds1'
> 2016-11-04 13:35:11.255388 7f945df1c700  1 mds.gp-ceph-mds1  6: '--setuser'
> 2016-11-04 13:35:11.255395 7f945df1c700  1 mds.gp-ceph-mds1  7: 'ceph'
> 2016-11-04 13:35:11.255401 7f945df1c700  1 mds.gp-ceph-mds1  8: '--setgroup'
> 2016-11-04 13:35:11.255407 7f945df1c700  1 mds.gp-ceph-mds1  9: 'ceph'
> 2016-11-04 13:35:11.255437 7f945df1c700  1 mds.gp-ceph-mds1  exe_path /usr/bin/ceph-mds
> 2016-11-04 13:35:11.259542 7fcf47754700  1 mds.0.1371026 handle_mds_map i am now mds.0.1371026
> 2016-11-04 13:35:11.259549 7fcf47754700  1 mds.0.1371026 handle_mds_map state change up:boot --> up:replay
> 2016-11-04 13:35:11.259563 7fcf47754700  1 mds.0.1371026 replay_start
> 2016-11-04 13:35:11.259565 7fcf47754700  1 mds.0.1371026  recovery set is
> 2016-11-04 13:35:11.259569 7fcf47754700  1 mds.0.1371026  waiting for osdmap 1382368 (which blacklists prior instance)
> 2016-11-04 13:35:11.266088 7fcf42447700  0 mds.0.cache creating system inode with ino:100
> 2016-11-04 13:35:11.266207 7fcf42447700  0 mds.0.cache creating system inode with ino:1
> 2016-11-04 13:35:11.270580 7fe46b86c200  0 ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b), process ceph-mds, pid
> 8619
> 2016-11-04 13:35:11.271530 7fe46b86c200  0 pidfile_write: ignore empty --pid-file
> 2016-11-04 13:35:11.507023 7fcf40637700  1 mds.0.1371026 replay_done
> 2016-11-04 13:35:11.507077 7fcf40637700  1 mds.0.1371026 making mds journal writeable
> 2016-11-04 13:35:12.366251 7fcf47754700  1 mds.0.1371026 handle_mds_map i am now mds.0.1371026
> 2016-11-04 13:35:12.366260 7fcf47754700  1 mds.0.1371026 handle_mds_map state change up:replay --> up:reconnect
> 2016-11-04 13:35:12.366268 7fcf47754700  1 mds.0.1371026 reconnect_start
> 2016-11-04 13:35:12.366269 7fcf47754700  1 mds.0.1371026 reopen_log
> 2016-11-04 13:35:12.366284 7fcf47754700  1 mds.0.server reconnect_clients -- 9 sessions
> 2016-11-04 13:35:12.366317 7fcf47754700  0 log_channel(cluster) log [DBG] : reconnect by client.2623964 10.1.103.231:0/2164520513
> after 0.000010
> 2016-11-04 13:35:12.366476 7fcf47754700  0 log_channel(cluster) log [DBG] : reconnect by client.2719692 10.1.103.233:0/174968170
> after 0.000184
> 2016-11-04 13:35:12.367048 7fcf47754700  0 log_channel(cluster) log [DBG] : reconnect by client.2719695 10.1.103.232:0/1211624383
> after 0.000752
> 2016-11-04 13:35:12.371509 7fcf47754700  0 log_channel(cluster) log [DBG] : reconnect by client.2765641 10.1.103.231:0/1622816431
> after 0.005214
> 2016-11-04 13:35:12.377333 7fcf47754700  0 log_channel(cluster) log [DBG] : reconnect by client.2623967 10.1.103.232:0/342252210
> after 0.011038
> 2016-11-04 13:35:12.377782 7fcf47754700  0 log_channel(cluster) log [DBG] : reconnect by client.2765635 10.1.103.231:0/2313865568
> after 0.011485
> 2016-11-04 13:35:12.396777 7fcf47754700  0 log_channel(cluster) log [DBG] : reconnect by client.2719689 10.1.103.232:0/3215599620
> after 0.030460
> 2016-11-04 13:35:12.418640 7fcf47754700  0 log_channel(cluster) log [DBG] : reconnect by client.2624111 10.1.106.3:0/2929007149
> after 0.052235
> 2016-11-04 13:35:15.407765 7fcf3de16700  0 -- 10.1.2.76:6805/1334 >> 10.1.103.232:0/3215599620 pipe(0x55d529479400 sd=1787 :6805 s=2
> pgs=6645 cs=1 l=0 c=0x55d529954a80).fault with nothing to send, going to standby
> 2016-11-04 13:35:15.409659 7fe465752700  1 mds.gp-ceph-mds1 handle_mds_map standby
> 2016-11-04 13:35:15.412866 7fcf47754700  1 mds.gp-ceph-mds1 handle_mds_map i (10.1.2.76:6805/1334) dne in the mdsmap, respawning
> myself
> 2016-11-04 13:35:15.412886 7fcf47754700  1 mds.gp-ceph-mds1 respawn
>
> Some of the Asserts
>
> 2016-11-04 13:26:30.344284 7f03fd42e700 -1 mds/MDSDaemon.cc: In function 'void MDSDaemon::respawn()' thread 7f03fd42e700 time
> 2016-11-04 13:26:30.329841
> mds/MDSDaemon.cc: 1132: FAILED assert(0)

This is the assert you hit if the MDS is trying to respawn itself but
it can't.  A slightly earlier log message might say why?

>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x80) [0x557546692e80]
>  2: (MDSDaemon::respawn()+0x73d) [0x5575462789fd]
>  3: (MDSDaemon::handle_mds_map(MMDSMap*)+0x1517) [0x557546281667]
>  4: (MDSDaemon::handle_core_message(Message*)+0x7f3) [0x557546284a03]
>  5: (MDSDaemon::ms_dispatch(Message*)+0x1c3) [0x557546284cf3]
>  6: (DispatchQueue::entry()+0xf2b) [0x557546799f6b]
>  7: (DispatchQueue::DispatchThread::entry()+0xd) [0x55754667911d]
>  8: (()+0x76fa) [0x7f04026b06fa]
>  9: (clone()+0x6d) [0x7f0400b71b5d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
>
> 2016-11-04 13:36:03.805549 7f924ded0700  1 mds.gp-ceph-mds1 asok_command: session evict (starting...)
> 2016-11-04 13:36:03.809513 7f924ded0700 -1 mds/MDLog.cc: In function 'void MDLog::_submit_entry(LogEvent*, MDSInternalContextBase*)'
> thread 7f924ded0700 time 2016-11-04 13:36:03.805586
> mds/MDLog.cc: 275: FAILED assert(!mds->is_any_replay())

This is a bug, but presumably not the root of your problems
(http://tracker.ceph.com/issues/17801)

>  ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x80) [0x55c691e22ef0]
>  2: (MDLog::_submit_entry(LogEvent*, MDSInternalContextBase*)+0x47f) [0x55c691c8189f]
>  3: (Server::journal_close_session(Session*, int, Context*)+0x6df) [0x55c691a6059f]
>  4: (Server::kill_session(Session*, Context*)+0x1bd) [0x55c691a623bd]
>  5: (MDSRankDispatcher::handle_asok_command(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >,
> std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >,
> boost::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> ><bool, long, double,
> std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >,
> std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > >,
> std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >,
> std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const,
> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> ><bool, long, double,
> std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >,
> std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > >&, ceph::Formatter*,
> std::ostream&)+0x491) [0x55c691a31df1]
>  6: (MDSDaemon::asok_command(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >,
> std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >,
> boost::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> ><bool, long, double,
> std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >,
> std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > >,
> std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >,
> std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const,
> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> ><bool, long, double,
> std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >,
> std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > >&,
> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::ostream&)+0x222) [0x55c691a00152]
>  7: (MDSSocketHook::call(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >,
> std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >,
> boost::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> ><bool, long, double,
> std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >,
> std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > >,
> std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >,
> std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const,
> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> ><bool, long, double,
> std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >,
> std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > >&,
> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::list&)+0xa8) [0x55c691a10db8]
>  8: (AdminSocket::do_accept()+0x1267) [0x55c691e1c987]
>  9: (AdminSocket::entry()+0x298) [0x55c691e1df48]
>  10: (()+0x770a) [0x7f925215070a]
>  11: (clone()+0x6d) [0x7f925061182d]
>
>
>
>> -----Original Message-----
>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Nick Fisk
>> Sent: 02 November 2016 21:50
>> To: 'Ceph Users' <ceph-users@xxxxxxxxxxxxxx>
>> Subject: Re:  MDS Problems - Solved but reporting for benefit of others
>>
>> A bit more digging, the original crash appears to be similar (but not exactly the same) as this tracker report
>>
>> http://tracker.ceph.com/issues/16983
>>
>> I can see that this was fixed in 10.2.3, so I will probably look to upgrade.
>>
>> If the logs make sense to anybody with a bit more knowledge I would be interested if that bug is related or if I have stumbled on
>> something new.
>>
>> Nick
>>
>> > -----Original Message-----
>> > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf
>> > Of Nick Fisk
>> > Sent: 02 November 2016 17:58
>> > To: 'Ceph Users' <ceph-users@xxxxxxxxxxxxxx>
>> > Subject:  MDS Problems - Solved but reporting for benefit
>> > of others
>> >
>> > Hi all,
>> >
>> > Just a bit of an outage with CephFS around the MDS's, I managed to get
>> > everything up and running again after a bit of head
>> scratching
>> > and thought I would share here what happened.
>> >
>> > Cause
>> > I believe the MDS's which were running as VM's suffered when the
>> > hypervisor ran out of ram and started swapping due to hypervisor maintenance. I know this is less than ideal and have put steps
> in
>> place to prevent this happening again.
>> >
>> > Symptoms
>> > 1. Noticed that both MDS's were down, log files on both showed that
>> > they had crashed 2. After restarting MDS's, their status kept flipping
>> > between replay and reconnect 3. Now again both MDS's would crash again
>> > 4. Log files showed they seemed to keep restarting after trying to
>> > reconnect clients 5. Clients were all kernel one was 3.19 and the rest
>> > 4.8. I believe the problematic client was
>> one of the
>> > ones running Kernel 4.8 6. Ceph is 10.2.2
>> >
>> > Resolution
>> > After some serious head scratching and a little bit of panicking, the
>> > fact the log files showed the restart always happened after
>> trying
>> > to reconnect the clients gave me the idea to try and kill the sessions
>> > on the MDS.  I first reset all the clients and waited, but
>> this didn't
>> > seem to have any effect and I could still see the MDS trying to
>> > reconnect to the clients. I then decided to try and kill the
>> sessions from
>> > the MDS end, so I shutdown the standby MDS (as they kept flipping
>> > active roles) and ran
>> >
>> > ceph daemon mds.gp-ceph-mds1 session ls
>> >
>> > I then tried to kill the last session in the list
>> >
>> > ceph daemon mds.gp-ceph-mds1 session evict <session id>
>> >
>> > I had to keep hammering this command to get it at the right point, as the MDS was only responding for a fraction of a second.
>> >
>> > Suddenly in my other window, where I had the tail of the MDS log, I
>> > saw a whizz of new information and then stopping with the MDS success
>> > message. So it seems something the MDS was trying to do whilst reconnecting was upsetting it. Ceph -s updated so show MDS was
>> now active. Rebooting other MDS then corrected made it standby as well. Problem solved.
>> >
>> > I have uploaded the 2 MDS logs here if any CephFS dev's are interested in taking a closer look.
>> >
>> >
>> > Nick
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com