Re: MDS Problems

Nick Fisk <nick@xxxxxxxxxx> · Fri, 4 Nov 2016 14:37:04 -0000

Hi John, thanks for your response

> -----Original Message-----
> From: John Spray [mailto:jspray@xxxxxxxxxx]
> Sent: 04 November 2016 14:26
> To: nick@xxxxxxxxxx
> Cc: Ceph Users <ceph-users@xxxxxxxxxxxxxx>
> Subject: Re:  MDS Problems
> 
> On Fri, Nov 4, 2016 at 2:54 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:
> > I upgraded to 10.2.3 today and after restarting the MDS, the same or
> > very similar problem occurred. I didn't see any of the symlink errors,
> > so I think that was fixed in the upgrade but I was still seeing looping and crashing until I killed all clients and evicted sessions.
> >
> > Just after restarting MDS
> >
> > 2016-11-04 13:26:30.373343 7f501f085700  0 mds.0.cache creating system
> > inode with ino:100
> > 2016-11-04 13:26:30.374377 7f501f085700  0 mds.0.cache creating system
> > inode with ino:1
> > 2016-11-04 13:26:31.290498 7f501d275700  1 mds.0.1370673 replay_done
> > 2016-11-04 13:26:31.294583 7f501d275700  1 mds.0.1370673 making mds
> > journal writeable
> > 2016-11-04 13:26:31.779444 7f5024392700  1 mds.0.1370673
> > handle_mds_map i am now mds.0.1370673
> > 2016-11-04 13:26:31.779470 7f5024392700  1 mds.0.1370673
> > handle_mds_map state change up:replay --> up:reconnect
> > 2016-11-04 13:26:31.779477 7f5024392700  1 mds.0.1370673
> > reconnect_start
> > 2016-11-04 13:26:31.779479 7f5024392700  1 mds.0.1370673 reopen_log
> > 2016-11-04 13:26:31.779485 7f5024392700  1 mds.0.server
> > reconnect_clients -- 11 sessions
> > 2016-11-04 13:26:31.785040 7f5024392700  0 log_channel(cluster) log
> > [DBG] : reconnect by client.2623964 10.1.103.231:0/2164520513 after
> > 0.005521
> > 2016-11-04 13:26:31.785218 7f5024392700  0 log_channel(cluster) log
> > [DBG] : reconnect by client.2765641 10.1.103.231:0/1622816431 after
> > 0.005723
> > 2016-11-04 13:26:31.786122 7f5024392700  0 log_channel(cluster) log
> > [DBG] : reconnect by client.2623967 10.1.103.232:0/342252210 after
> > 0.006627
> > 2016-11-04 13:26:31.786713 7f5024392700  0 log_channel(cluster) log
> > [DBG] : reconnect by client.2719695 10.1.103.232:0/1211624383 after
> > 0.007219
> > 2016-11-04 13:26:31.798640 7f5024392700  0 log_channel(cluster) log
> > [DBG] : reconnect by client.2719692 10.1.103.233:0/174968170 after
> > 0.019143
> > 2016-11-04 13:26:31.800473 7f5024392700  0 log_channel(cluster) log
> > [DBG] : reconnect by client.2719698 10.1.103.233:0/2257480425 after
> > 0.020906
> > 2016-11-04 13:26:31.801293 7f5024392700  0 log_channel(cluster) log
> > [DBG] : reconnect by client.2765635 10.1.103.231:0/2313865568 after
> > 0.021787
> > 2016-11-04 13:26:31.886733 7f5024392700  0 log_channel(cluster) log
> > [DBG] : reconnect by client.2719689 10.1.103.232:0/3215599620 after
> > 0.107140
> > 2016-11-04 13:26:31.938048 7f5024392700  0 log_channel(cluster) log
> > [DBG] : reconnect by client.2765770 10.1.106.7:0/789497307 after
> > 0.157027
> > 2016-11-04 13:26:31.962209 7f5024392700  0 log_channel(cluster) log
> > [DBG] : reconnect by client.2624111 10.1.106.3:0/2929007149 after
> > 0.182597
> > 2016-11-04 13:26:31.984629 7f5024392700  0 log_channel(cluster) log
> > [DBG] : reconnect by client.2765638 10.1.103.233:0/2247183319 after
> > 0.205018
> > 2016-11-04 13:26:32.029738 7f5024392700  1 mds.0.1370673
> > reconnect_done
> > 2016-11-04 13:26:32.785680 7f5024392700  1 mds.0.1370673
> > handle_mds_map i am now mds.0.1370673
> > 2016-11-04 13:26:32.785715 7f5024392700  1 mds.0.1370673
> > handle_mds_map state change up:reconnect --> up:rejoin
> > 2016-11-04 13:26:32.785731 7f5024392700  1 mds.0.1370673 rejoin_start
> > 2016-11-04 13:26:38.572270 7f5024392700  1 mds.0.1370673
> > rejoin_joint_start
> > 2016-11-04 13:26:43.971915 7f5020289700  1 mds.0.1370673 rejoin_done
> > 2016-11-04 13:26:44.375857 7f5024392700  1 mds.0.1370673
> > handle_mds_map i am now mds.0.1370673
> > 2016-11-04 13:26:44.375871 7f5024392700  1 mds.0.1370673
> > handle_mds_map state change up:rejoin --> up:active
> > 2016-11-04 13:26:44.375878 7f5024392700  1 mds.0.1370673 recovery_done -- successful recovery!
> > 2016-11-04 13:26:44.375949 7f5024392700  1 mds.0.1370673 active_start
> > 2016-11-04 13:26:44.519621 7f5024392700  1 mds.0.1370673 cluster recovered.
> 
> The question here is what's sending the SIGTERM?  It could be the OOM killer, but given that your system made it to active in just a
> few seconds, it seems unlikely that it could have eaten that much RAM that quickly unless there was something else on the system
> consuming the available memory?

I can't see any alerts that OOM was triggered, I think I only saw this terminated message once, so it might be a red herring. 

As you mentioned the asserts are probably due to unhandled exceptions, most likely me trying to evict clients as the MDS rapidly loop round between replay and reconnect. I think the problem is the next looping section of the log I pasted, but I can't see anything obvious why it gets stuck in this loop. Ie what does this line mean

handle_mds_map i (10.1.2.76:6805/1334) dne in the mdsmap, respawning myself

There is obviously something one of the stale clients is trying to do when it reconnects that upsets it???

> 
> > 2016-11-04 13:27:03.651175 7f5020a8a700 -1 mds.gp-ceph-mds1 *** got
> > signal Terminated ***
> > 2016-11-04 13:27:03.651191 7f5020a8a700  1 mds.gp-ceph-mds1 suicide.
> > wanted state up:active
> >
> > A section whilst its looping
> >
> > 2016-11-04 13:35:11.255310 7f945df1c700  1 mds.gp-ceph-mds1
> > handle_mds_map i (10.1.2.76:6800/8619) dne in the mdsmap, respawning
> > myself
> > 2016-11-04 13:35:11.255330 7f945df1c700  1 mds.gp-ceph-mds1 respawn
> > 2016-11-04 13:35:11.255339 7f945df1c700  1 mds.gp-ceph-mds1  e: '/usr/bin/ceph-mds'
> > 2016-11-04 13:35:11.255346 7f945df1c700  1 mds.gp-ceph-mds1  0: '/usr/bin/ceph-mds'
> > 2016-11-04 13:35:11.255353 7f945df1c700  1 mds.gp-ceph-mds1  1: '-f'
> > 2016-11-04 13:35:11.255359 7f945df1c700  1 mds.gp-ceph-mds1  2: '--cluster'
> > 2016-11-04 13:35:11.255366 7f945df1c700  1 mds.gp-ceph-mds1  3: 'ceph'
> > 2016-11-04 13:35:11.255372 7f945df1c700  1 mds.gp-ceph-mds1  4: '--id'
> > 2016-11-04 13:35:11.255381 7f945df1c700  1 mds.gp-ceph-mds1  5: 'gp-ceph-mds1'
> > 2016-11-04 13:35:11.255388 7f945df1c700  1 mds.gp-ceph-mds1  6: '--setuser'
> > 2016-11-04 13:35:11.255395 7f945df1c700  1 mds.gp-ceph-mds1  7: 'ceph'
> > 2016-11-04 13:35:11.255401 7f945df1c700  1 mds.gp-ceph-mds1  8: '--setgroup'
> > 2016-11-04 13:35:11.255407 7f945df1c700  1 mds.gp-ceph-mds1  9: 'ceph'
> > 2016-11-04 13:35:11.255437 7f945df1c700  1 mds.gp-ceph-mds1  exe_path
> > /usr/bin/ceph-mds
> > 2016-11-04 13:35:11.259542 7fcf47754700  1 mds.0.1371026
> > handle_mds_map i am now mds.0.1371026
> > 2016-11-04 13:35:11.259549 7fcf47754700  1 mds.0.1371026
> > handle_mds_map state change up:boot --> up:replay
> > 2016-11-04 13:35:11.259563 7fcf47754700  1 mds.0.1371026 replay_start
> > 2016-11-04 13:35:11.259565 7fcf47754700  1 mds.0.1371026  recovery set
> > is
> > 2016-11-04 13:35:11.259569 7fcf47754700  1 mds.0.1371026  waiting for
> > osdmap 1382368 (which blacklists prior instance)
> > 2016-11-04 13:35:11.266088 7fcf42447700  0 mds.0.cache creating system
> > inode with ino:100
> > 2016-11-04 13:35:11.266207 7fcf42447700  0 mds.0.cache creating system
> > inode with ino:1
> > 2016-11-04 13:35:11.270580 7fe46b86c200  0 ceph version 10.2.3
> > (ecc23778eb545d8dd55e2e4735b53cc93f92e65b), process ceph-mds, pid
> > 8619
> > 2016-11-04 13:35:11.271530 7fe46b86c200  0 pidfile_write: ignore empty
> > --pid-file
> > 2016-11-04 13:35:11.507023 7fcf40637700  1 mds.0.1371026 replay_done
> > 2016-11-04 13:35:11.507077 7fcf40637700  1 mds.0.1371026 making mds
> > journal writeable
> > 2016-11-04 13:35:12.366251 7fcf47754700  1 mds.0.1371026
> > handle_mds_map i am now mds.0.1371026
> > 2016-11-04 13:35:12.366260 7fcf47754700  1 mds.0.1371026
> > handle_mds_map state change up:replay --> up:reconnect
> > 2016-11-04 13:35:12.366268 7fcf47754700  1 mds.0.1371026
> > reconnect_start
> > 2016-11-04 13:35:12.366269 7fcf47754700  1 mds.0.1371026 reopen_log
> > 2016-11-04 13:35:12.366284 7fcf47754700  1 mds.0.server
> > reconnect_clients -- 9 sessions
> > 2016-11-04 13:35:12.366317 7fcf47754700  0 log_channel(cluster) log
> > [DBG] : reconnect by client.2623964 10.1.103.231:0/2164520513 after
> > 0.000010
> > 2016-11-04 13:35:12.366476 7fcf47754700  0 log_channel(cluster) log
> > [DBG] : reconnect by client.2719692 10.1.103.233:0/174968170 after
> > 0.000184
> > 2016-11-04 13:35:12.367048 7fcf47754700  0 log_channel(cluster) log
> > [DBG] : reconnect by client.2719695 10.1.103.232:0/1211624383 after
> > 0.000752
> > 2016-11-04 13:35:12.371509 7fcf47754700  0 log_channel(cluster) log
> > [DBG] : reconnect by client.2765641 10.1.103.231:0/1622816431 after
> > 0.005214
> > 2016-11-04 13:35:12.377333 7fcf47754700  0 log_channel(cluster) log
> > [DBG] : reconnect by client.2623967 10.1.103.232:0/342252210 after
> > 0.011038
> > 2016-11-04 13:35:12.377782 7fcf47754700  0 log_channel(cluster) log
> > [DBG] : reconnect by client.2765635 10.1.103.231:0/2313865568 after
> > 0.011485
> > 2016-11-04 13:35:12.396777 7fcf47754700  0 log_channel(cluster) log
> > [DBG] : reconnect by client.2719689 10.1.103.232:0/3215599620 after
> > 0.030460
> > 2016-11-04 13:35:12.418640 7fcf47754700  0 log_channel(cluster) log
> > [DBG] : reconnect by client.2624111 10.1.106.3:0/2929007149 after
> > 0.052235
> > 2016-11-04 13:35:15.407765 7fcf3de16700  0 -- 10.1.2.76:6805/1334 >>
> > 10.1.103.232:0/3215599620 pipe(0x55d529479400 sd=1787 :6805 s=2
> > pgs=6645 cs=1 l=0 c=0x55d529954a80).fault with nothing to send, going
> > to standby
> > 2016-11-04 13:35:15.409659 7fe465752700  1 mds.gp-ceph-mds1
> > handle_mds_map standby
> > 2016-11-04 13:35:15.412866 7fcf47754700  1 mds.gp-ceph-mds1
> > handle_mds_map i (10.1.2.76:6805/1334) dne in the mdsmap, respawning
> > myself
> > 2016-11-04 13:35:15.412886 7fcf47754700  1 mds.gp-ceph-mds1 respawn
> >
> > Some of the Asserts
> >
> > 2016-11-04 13:26:30.344284 7f03fd42e700 -1 mds/MDSDaemon.cc: In
> > function 'void MDSDaemon::respawn()' thread 7f03fd42e700 time
> > 2016-11-04 13:26:30.329841
> > mds/MDSDaemon.cc: 1132: FAILED assert(0)
> 
> This is the assert you hit if the MDS is trying to respawn itself but it can't.  A slightly earlier log message might say why?
> 
> >  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
> >  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > const*)+0x80) [0x557546692e80]
> >  2: (MDSDaemon::respawn()+0x73d) [0x5575462789fd]
> >  3: (MDSDaemon::handle_mds_map(MMDSMap*)+0x1517) [0x557546281667]
> >  4: (MDSDaemon::handle_core_message(Message*)+0x7f3) [0x557546284a03]
> >  5: (MDSDaemon::ms_dispatch(Message*)+0x1c3) [0x557546284cf3]
> >  6: (DispatchQueue::entry()+0xf2b) [0x557546799f6b]
> >  7: (DispatchQueue::DispatchThread::entry()+0xd) [0x55754667911d]
> >  8: (()+0x76fa) [0x7f04026b06fa]
> >  9: (clone()+0x6d) [0x7f0400b71b5d]
> >  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
> >
> > 2016-11-04 13:36:03.805549 7f924ded0700  1 mds.gp-ceph-mds1
> > asok_command: session evict (starting...)
> > 2016-11-04 13:36:03.809513 7f924ded0700 -1 mds/MDLog.cc: In function 'void MDLog::_submit_entry(LogEvent*,
> MDSInternalContextBase*)'
> > thread 7f924ded0700 time 2016-11-04 13:36:03.805586
> > mds/MDLog.cc: 275: FAILED assert(!mds->is_any_replay())
> 
> This is a bug, but presumably not the root of your problems
> (http://tracker.ceph.com/issues/17801)
> 
> >  ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
> >  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > const*)+0x80) [0x55c691e22ef0]
> >  2: (MDLog::_submit_entry(LogEvent*, MDSInternalContextBase*)+0x47f)
> > [0x55c691c8189f]
> >  3: (Server::journal_close_session(Session*, int, Context*)+0x6df)
> > [0x55c691a6059f]
> >  4: (Server::kill_session(Session*, Context*)+0x1bd) [0x55c691a623bd]
> >  5:
> > (MDSRankDispatcher::handle_asok_command(std::__cxx11::basic_string<cha
> > r, std::char_traits<char>, std::allocator<char> >,
> > std::map<std::__cxx11::basic_string<char, std::char_traits<char>,
> > std::allocator<char> >,
> > boost::variant<std::__cxx11::basic_string<char,
> > std::char_traits<char>, std::allocator<char> ><bool, long, double,
> > std::vector<std::__cxx11::basic_string<char, std::char_traits<char>,
> > std::allocator<char> >,
> > std::allocator<std::__cxx11::basic_string<char,
> > std::char_traits<char>, std::allocator<char> > > > > >,
> > std::less<std::__cxx11::basic_string<char, std::char_traits<char>,
> > std::allocator<char> > >,
> > std::allocator<std::pair<std::__cxx11::basic_string<char,
> > std::char_traits<char>, std::allocator<char> > const,
> > std::__cxx11::basic_string<char, std::char_traits<char>,
> > std::allocator<char> ><bool, long, double,
> > std::vector<std::__cxx11::basic_string<char, std::char_traits<char>,
> > std::allocator<char> >,
> > std::allocator<std::__cxx11::basic_string<char,
> > std::char_traits<char>, std::allocator<char> > > > > > > >&,
> > ceph::Formatter*,
> > std::ostream&)+0x491) [0x55c691a31df1]
> >  6: (MDSDaemon::asok_command(std::__cxx11::basic_string<char,
> > std::char_traits<char>, std::allocator<char> >,
> > std::map<std::__cxx11::basic_string<char, std::char_traits<char>,
> > std::allocator<char> >,
> > boost::variant<std::__cxx11::basic_string<char,
> > std::char_traits<char>, std::allocator<char> ><bool, long, double,
> > std::vector<std::__cxx11::basic_string<char, std::char_traits<char>,
> > std::allocator<char> >,
> > std::allocator<std::__cxx11::basic_string<char,
> > std::char_traits<char>, std::allocator<char> > > > > >,
> > std::less<std::__cxx11::basic_string<char, std::char_traits<char>,
> > std::allocator<char> > >,
> > std::allocator<std::pair<std::__cxx11::basic_string<char,
> > std::char_traits<char>, std::allocator<char> > const,
> > std::__cxx11::basic_string<char, std::char_traits<char>,
> > std::allocator<char> ><bool, long, double,
> > std::vector<std::__cxx11::basic_string<char, std::char_traits<char>,
> > std::allocator<char> >,
> > std::allocator<std::__cxx11::basic_string<char,
> > std::char_traits<char>, std::allocator<char> > > > > > > >&,
> > std::__cxx11::basic_string<char, std::char_traits<char>,
> > std::allocator<char> >, std::ostream&)+0x222) [0x55c691a00152]
> >  7: (MDSSocketHook::call(std::__cxx11::basic_string<char,
> > std::char_traits<char>, std::allocator<char> >,
> > std::map<std::__cxx11::basic_string<char, std::char_traits<char>,
> > std::allocator<char> >,
> > boost::variant<std::__cxx11::basic_string<char,
> > std::char_traits<char>, std::allocator<char> ><bool, long, double,
> > std::vector<std::__cxx11::basic_string<char, std::char_traits<char>,
> > std::allocator<char> >,
> > std::allocator<std::__cxx11::basic_string<char,
> > std::char_traits<char>, std::allocator<char> > > > > >,
> > std::less<std::__cxx11::basic_string<char, std::char_traits<char>,
> > std::allocator<char> > >,
> > std::allocator<std::pair<std::__cxx11::basic_string<char,
> > std::char_traits<char>, std::allocator<char> > const,
> > std::__cxx11::basic_string<char, std::char_traits<char>,
> > std::allocator<char> ><bool, long, double,
> > std::vector<std::__cxx11::basic_string<char, std::char_traits<char>,
> > std::allocator<char> >,
> > std::allocator<std::__cxx11::basic_string<char,
> > std::char_traits<char>, std::allocator<char> > > > > > > >&,
> > std::__cxx11::basic_string<char, std::char_traits<char>,
> > std::allocator<char> >, ceph::buffer::list&)+0xa8) [0x55c691a10db8]
> >  8: (AdminSocket::do_accept()+0x1267) [0x55c691e1c987]
> >  9: (AdminSocket::entry()+0x298) [0x55c691e1df48]
> >  10: (()+0x770a) [0x7f925215070a]
> >  11: (clone()+0x6d) [0x7f925061182d]
> >
> >
> >
> >> -----Original Message-----
> >> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf
> >> Of Nick Fisk
> >> Sent: 02 November 2016 21:50
> >> To: 'Ceph Users' <ceph-users@xxxxxxxxxxxxxx>
> >> Subject: Re:  MDS Problems - Solved but reporting for
> >> benefit of others
> >>
> >> A bit more digging, the original crash appears to be similar (but not
> >> exactly the same) as this tracker report
> >>
> >> http://tracker.ceph.com/issues/16983
> >>
> >> I can see that this was fixed in 10.2.3, so I will probably look to upgrade.
> >>
> >> If the logs make sense to anybody with a bit more knowledge I would
> >> be interested if that bug is related or if I have stumbled on something new.
> >>
> >> Nick
> >>
> >> > -----Original Message-----
> >> > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On
> >> > Behalf Of Nick Fisk
> >> > Sent: 02 November 2016 17:58
> >> > To: 'Ceph Users' <ceph-users@xxxxxxxxxxxxxx>
> >> > Subject:  MDS Problems - Solved but reporting for
> >> > benefit of others
> >> >
> >> > Hi all,
> >> >
> >> > Just a bit of an outage with CephFS around the MDS's, I managed to
> >> > get everything up and running again after a bit of head
> >> scratching
> >> > and thought I would share here what happened.
> >> >
> >> > Cause
> >> > I believe the MDS's which were running as VM's suffered when the
> >> > hypervisor ran out of ram and started swapping due to hypervisor
> >> > maintenance. I know this is less than ideal and have put steps
> > in
> >> place to prevent this happening again.
> >> >
> >> > Symptoms
> >> > 1. Noticed that both MDS's were down, log files on both showed that
> >> > they had crashed 2. After restarting MDS's, their status kept
> >> > flipping between replay and reconnect 3. Now again both MDS's would
> >> > crash again 4. Log files showed they seemed to keep restarting
> >> > after trying to reconnect clients 5. Clients were all kernel one
> >> > was 3.19 and the rest 4.8. I believe the problematic client was
> >> one of the
> >> > ones running Kernel 4.8 6. Ceph is 10.2.2
> >> >
> >> > Resolution
> >> > After some serious head scratching and a little bit of panicking,
> >> > the fact the log files showed the restart always happened after
> >> trying
> >> > to reconnect the clients gave me the idea to try and kill the
> >> > sessions on the MDS.  I first reset all the clients and waited, but
> >> this didn't
> >> > seem to have any effect and I could still see the MDS trying to
> >> > reconnect to the clients. I then decided to try and kill the
> >> sessions from
> >> > the MDS end, so I shutdown the standby MDS (as they kept flipping
> >> > active roles) and ran
> >> >
> >> > ceph daemon mds.gp-ceph-mds1 session ls
> >> >
> >> > I then tried to kill the last session in the list
> >> >
> >> > ceph daemon mds.gp-ceph-mds1 session evict <session id>
> >> >
> >> > I had to keep hammering this command to get it at the right point, as the MDS was only responding for a fraction of a second.
> >> >
> >> > Suddenly in my other window, where I had the tail of the MDS log, I
> >> > saw a whizz of new information and then stopping with the MDS
> >> > success message. So it seems something the MDS was trying to do
> >> > whilst reconnecting was upsetting it. Ceph -s updated so show MDS
> >> > was
> >> now active. Rebooting other MDS then corrected made it standby as well. Problem solved.
> >> >
> >> > I have uploaded the 2 MDS logs here if any CephFS dev's are interested in taking a closer look.
> >> >
> >> >
> >> > Nick
> >> >
> >> > _______________________________________________
> >> > ceph-users mailing list
> >> > ceph-users@xxxxxxxxxxxxxx
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@xxxxxxxxxxxxxx
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com