Re: mon crash

Joao Eduardo Luis <joao.luis@xxxxxxxxxxx> · Tue, 16 Apr 2013 14:21:01 +0100

On 04/16/2013 12:01 AM, Dan Mick wrote:
Two is a strange choice for number of monitors; you really want an odd
number.  With two, if either one fails (or you have a network fault),
the cluster is dead because there's no majority.

That said, we certainly don't expect monitors to die when the network
fault goes away.  Searching the bug database reveals
http://tracker.ceph.com/issues/4175, but this should have been included
in v0.60.

I'll ask Joao to have a look.  Joao, can you have a look?  :)

Matthew just pointed me to this email thread, I had totally missed it 
(bloody thunderbird).

As Matthew said in his email, that's #3495 and is being worked on.

I'll update the ticket as soon as I have a fix ready.

  -Joao

On 04/15/2013 01:56 PM, Craig Lewis wrote:

I'm doing a test of Ceph in two colo facilities.  Since it's just a
test, I only have 2 VMs running, one in each colo.  Both VMs are runing
mon, mds, a single osd, and the RADOS gw.  Cephx is disabled.  I'm
testing if the latency between the two facilities (~20ms) is low enough
that I can run a single Ceph cluster in both locations.  If it doesn't
work out, I'll run two independent Ceph clusters with manual replication.

This weekend, the connection between the two locations was degraded.
The link had 37% packet loss, for less than a minute. When the link
returned to normal, the re-elected mon leader crashed.

Is this a real bug, or did this happen because I'm only running 2
nodes?  I'm trying to avoid bringing more nodes into this test. My VM
infrastructure is pretty weak, and I'm afraid that more nodes would
introduce more noise in the test.

I saw this happen once before (the primary colo had a UPS failure,
causing a switch reboot).  The same process crashed, with the same stack
trace.  When that happened, I ran sudo service ceph restart on the
machine with the crashed mon, and everything started up fine.  I haven't
restarted anything this time.

I tried to recreate the problem by stopping and starting the VPN between
the two locations, but that didn't trigger the crash.  I have some more
ideas on how to trigger, I'll continue trying today.

arnulf@ceph0:~$ lsb_release -a
Distributor ID:    Ubuntu
Description:    Ubuntu 12.04.2 LTS
Release:    12.04
Codename:    precise

arnulf@ceph0:~$ uname -a
Linux ceph0 3.5.0-27-generic #46~precise1-Ubuntu SMP Tue Mar 26 19:33:21
UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

arnulf@ceph0:~$ cat /etc/apt/sources.list.d/ceph.list
deb http://ceph.com/debian-testing/ precise main

arnulf@ceph0:~$ ceph -v
ceph version 0.60 (f26f7a39021dbf440c28d6375222e21c94fe8e5c)

ceph-mon.log from the non-elected master, mon.b:
2013-04-13 07:57:39.445098 7fde958f4700  0 mon.b@1(peon).data_health(20)
update_stats avail 85% total 17295768 used 1679152 avail 14738024
2013-04-13 07:58:35.150603 7fde950f3700  0 log [INF] : mon.b calling new
monitor election
2013-04-13 07:58:35.150876 7fde950f3700  1 mon.b@1(electing).elector(20)
init, last seen epoch 20
2013-04-13 07:58:39.445355 7fde958f4700  0
mon.b@1(electing).data_health(20) update_stats avail 85% total 17295768
used 1679152 avail 14738024
2013-04-13 07:58:40.192514 7fde958f4700  1 mon.b@1(electing).elector(21)
init, last seen epoch 21
2013-04-13 07:58:43.748907 7fde93dee700  0 -- 192.168.22.62:6789/0 >>
192.168.2.62:6789/0 pipe(0x2c56500 sd=25 :6789 s=2 pgs=108 cs=1
l=0).fault, initiating reconnect
2013-04-13 07:58:43.786209 7fde93ff0700  0 -- 192.168.22.62:6789/0 >>
192.168.2.62:6789/0 pipe(0x2c56500 sd=8 :6789 s=1 pgs=108 cs=2 l=0).fault
2013-04-13 07:59:13.050245 7fde958f4700  1 mon.b@1(probing) e1
discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client
elsewhere; we are not in quorum
2013-04-13 07:59:13.050277 7fde958f4700  1 mon.b@1(probing) e1
discarding message auth(proto 0 34 bytes epoch 1) v1 and sending client
elsewhere; we are not in quorum
2013-04-13 07:59:13.050285 7fde958f4700  1 mon.b@1(probing) e1
discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client
elsewhere; we are not in quorum
...

ceph-mon.log from the elected master, mon.a:
2013-04-13 07:57:41.756844 7f162be82700  0
mon.a@0(leader).data_health(20) update_stats avail 84% total 17295768
used 1797312 avail 14619864
2013-04-13 07:58:35.210875 7f162b681700  0 log [INF] : mon.a calling new
monitor election
2013-04-13 07:58:35.211081 7f162b681700  1 mon.a@0(electing).elector(20)
init, last seen epoch 20
2013-04-13 07:58:40.270547 7f162be82700  1 mon.a@0(electing).elector(21)
init, last seen epoch 21
2013-04-13 07:58:41.757032 7f162be82700  0
mon.a@0(electing).data_health(20) update_stats avail 84% total 17295768
used 1797312 avail 14619864
2013-04-13 07:58:43.441306 7f162b681700  0 log [INF] : mon.a@0 won
leader election with quorum 0,1
2013-04-13 07:58:43.560319 7f162b681700  0 log [INF] : pgmap v1684: 632
pgs: 632 active+clean; 9982 bytes data, 2079 MB used, 100266 MB / 102346
MB avail; 0B/s rd, 0B/s wr, 0op/s
2013-04-13 07:58:43.561722 7f162b681700 -1 mon/PaxosService.cc: In
function 'void PaxosService::propose_pending()' thread 7f162b681700 time
2013-04-13 07:58:43.560456
mon/PaxosService.cc: 127: FAILED assert(have_pending)

  ceph version 0.60 (f26f7a39021dbf440c28d6375222e21c94fe8e5c)
  1: (PaxosService::propose_pending()+0x46d) [0x4dee3d]
  2: (MDSMonitor::tick()+0x1c62) [0x51cdd2]
  3: (MDSMonitor::on_active()+0x1a) [0x512ada]
  4: (PaxosService::_active()+0x31d) [0x4e067d]
  5: (Context::complete(int)+0xa) [0x4b7b4a]
  6: (finish_contexts(CephContext*, std::list<Context*,
std::allocator<Context*> >&, int)+0x95) [0x4ba5a5]
  7: (Paxos::handle_last(MMonPaxos*)+0xbef) [0x4da92f]
  8: (Paxos::dispatch(PaxosServiceMessage*)+0x26b) [0x4dad8b]
  9: (Monitor::_ms_dispatch(Message*)+0x149f) [0x4b310f]
  10: (Monitor::ms_dispatch(Message*)+0x32) [0x4c9d12]
  11: (DispatchQueue::entry()+0x341) [0x698da1]
  12: (DispatchQueue::DispatchThread::entry()+0xd) [0x626c5d]
  13: (()+0x7e9a) [0x7f1630c76e9a]
  14: (clone()+0x6d) [0x7f162f911cbd]
  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

I saved the log files from both events, in case you want the --- begin
dump of recent events ---.

This looks somewhat similiar to http://tracker.ceph.com/issues/3495. The
first 6 lines of the stack trace are the same functions, but they
diverge at line 7.

--

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email clewis@xxxxxxxxxxxxxxxxxx <mailto:clewis@xxxxxxxxxxxxxxxxxx>

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website <http://www.centraldesktop.com/>  | Twitter
<http://www.twitter.com/centraldesktop>  | Facebook
<http://www.facebook.com/CentralDesktop>  | LinkedIn
<http://www.linkedin.com/groups?gid=147417>  | Blog
<http://cdblog.centraldesktop.com/>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com