[Added ceph-devel to Cc:] On Thu, May 17, 2012 at 1:56 AM, Clint Byrum <clint@xxxxxxxxxx> wrote: > Hi Tommi. > > I got home and just had to finish the refactor. Now I get a crash, which I've pasted below. Well, I've never seen that one before. Anyone else have a clue? Sage, Greg? For others: that's using a setup where one of the monitors (mon.cmon-0) gets a monmap with itself as the only member, and others (mon.cmon-1) get a monmap that has the IP of mon.cmon-0. > Its reproducible with the charm I have.. which I've pushed to this bzr branch: > > lp:~clint-fewbar/charms/precise/ceph-mon/rewrite > > If you are brave enough to try juju: > > juju bootstrap > juju deploy --repository ~/charms local:ceph-mon -n 3 > ... wait > juju set ceph-mon initializing-unit=ceph-mon/0 > > The crash is on the "leader" (renamed to the initializing-unit to remove confusion). > > I've left this running, if you'd like to do anything as far as debugging. > > Anyway, the log from the non-leaders looks like this: > > 2012-05-17 08:48:12.077550 7fd3e11d1780 0 store(/mnt/mon.cmon-1) created monfs at /mnt/mon.cmon-1 for cmon-1 > 2012-05-17 08:48:13.714240 7fd5b13ed780 1 mon.cmon-1@-1(probing) e0 init fsid f56c6f22-9ffc-11e1-bae9-22000afc46fb > 2012-05-17 08:48:13.789339 7fd5ac46a700 0 log [INF] : mon.cmon-1 calling new monitor election > 2012-05-17 08:48:14.416376 7fd5ab468700 0 -- 10.252.87.112:6800/0 >> 10.252.70.251:6789/0 pipe(0x2346500 sd=17 pgs=2 cs=1 l=0).fault initiating reconnect > > And here is the crash log: > > root@ip-10-252-70-251:~# cat /var/log/ceph/ceph-mon.cmon-0.log > 2012-05-17 08:47:42.647248 7fc7c734d780 0 store(/mnt/mon.cmon-0) created monfs at /mnt/mon.cmon-0 for cmon-0 > 2012-05-17 08:47:48.263011 7f1657506780 1 mon.cmon-0@0(probing) e0 init fsid f56c6f22-9ffc-11e1-bae9-22000afc46fb > 2012-05-17 08:47:48.263654 7f1657506780 1 mon.cmon-0@0(probing) e0 win_standalone_election > 2012-05-17 08:47:48.263748 7f1657506780 0 log [INF] : mon.cmon-0@0 won leader election with quorum 0 > 2012-05-17 08:47:48.297184 7f1657506780 1 mon.cmon-0@0(leader).osd e1 e1: 0 osds: 0 up, 0 in > 2012-05-17 08:47:48.528721 7f1657506780 1 mon.cmon-0@0(probing) e1 win_standalone_election > 2012-05-17 08:47:48.528829 7f1657506780 0 log [INF] : mon.cmon-0@0 won leader election with quorum 0 > 2012-05-17 08:48:13.749820 7f1652583700 0 mon.cmon-0@0(leader).monmap v1 adding cmon-2 at 10.252.33.186:6800/0 to monitor cluster > 2012-05-17 08:48:13.752916 7f1652583700 0 log [INF] : mon.cmon-0 calling new monitor election > 2012-05-17 08:48:13.881387 7f1652583700 -1 mon/MonMap.h: In function 'entity_inst_t MonMap::get_inst(unsigned int) const' thread 7f1652583700 time 2012-05-17 08:48:13.880187 > mon/MonMap.h: 167: FAILED assert(m < rank_addr.size()) > > ceph version 0.46-219-g54bc094 (commit:54bc09417917d8d0ca99d8ed8285498b7d5aa369) > 1: (Elector::defer(int)+0x6da) [0x5162aa] > 2: (Elector::handle_propose(MMonElection*)+0x573) [0x516903] > 3: (Elector::dispatch(Message*)+0xc73) [0x519863] > 4: (Monitor::_ms_dispatch(Message*)+0x3f3) [0x4865a3] > 5: (Monitor::ms_dispatch(Message*)+0x32) [0x493752] > 6: (SimpleMessenger::dispatch_entry()+0x863) [0x5bf653] > 7: (SimpleMessenger::DispatchThread::entry()+0xd) [0x589e3d] > 8: (()+0x7e9a) [0x7f16570dce9a] > 9: (clone()+0x6d) [0x7f16558954bd] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. > > --- begin dump of recent events --- > -10> 2012-05-17 08:47:48.253421 7f1657506780 1 store(/mnt/mon.cmon-0) mount > -9> 2012-05-17 08:47:48.257204 7f1657506780 0 ceph version 0.46-219-g54bc094 (commit:54bc09417917d8d0ca99d8ed8285498b7d5aa369), process ceph-mon, pid 7640 > -8> 2012-05-17 08:47:48.263011 7f1657506780 1 mon.cmon-0@0(probing) e0 init fsid f56c6f22-9ffc-11e1-bae9-22000afc46fb > -7> 2012-05-17 08:47:48.263654 7f1657506780 1 mon.cmon-0@0(probing) e0 win_standalone_election > -6> 2012-05-17 08:47:48.263748 7f1657506780 0 log [INF] : mon.cmon-0@0 won leader election with quorum 0 > -5> 2012-05-17 08:47:48.297184 7f1657506780 1 mon.cmon-0@0(leader).osd e1 e1: 0 osds: 0 up, 0 in > -4> 2012-05-17 08:47:48.528721 7f1657506780 1 mon.cmon-0@0(probing) e1 win_standalone_election > -3> 2012-05-17 08:47:48.528829 7f1657506780 0 log [INF] : mon.cmon-0@0 won leader election with quorum 0 > -2> 2012-05-17 08:48:13.749820 7f1652583700 0 mon.cmon-0@0(leader).monmap v1 adding cmon-2 at 10.252.33.186:6800/0 to monitor cluster > -1> 2012-05-17 08:48:13.752916 7f1652583700 0 log [INF] : mon.cmon-0 calling new monitor election > 0> 2012-05-17 08:48:13.881387 7f1652583700 -1 mon/MonMap.h: In function 'entity_inst_t MonMap::get_inst(unsigned int) const' thread 7f1652583700 time 2012-05-17 08:48:13.880187 > mon/MonMap.h: 167: FAILED assert(m < rank_addr.size()) > > ceph version 0.46-219-g54bc094 (commit:54bc09417917d8d0ca99d8ed8285498b7d5aa369) > 1: (Elector::defer(int)+0x6da) [0x5162aa] > 2: (Elector::handle_propose(MMonElection*)+0x573) [0x516903] > 3: (Elector::dispatch(Message*)+0xc73) [0x519863] > 4: (Monitor::_ms_dispatch(Message*)+0x3f3) [0x4865a3] > 5: (Monitor::ms_dispatch(Message*)+0x32) [0x493752] > 6: (SimpleMessenger::dispatch_entry()+0x863) [0x5bf653] > 7: (SimpleMessenger::DispatchThread::entry()+0xd) [0x589e3d] > 8: (()+0x7e9a) [0x7f16570dce9a] > 9: (clone()+0x6d) [0x7f16558954bd] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. > > --- end dump of recent events --- > 2012-05-17 08:48:13.882910 7f1652583700 -1 *** Caught signal (Aborted) ** > in thread 7f1652583700 > > ceph version 0.46-219-g54bc094 (commit:54bc09417917d8d0ca99d8ed8285498b7d5aa369) > 1: /usr/bin/ceph-mon() [0x622e9a] > 2: (()+0xfcb0) [0x7f16570e4cb0] > 3: (gsignal()+0x35) [0x7f16557d9445] > 4: (abort()+0x17b) [0x7f16557dcbab] > 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f165612769d] > 6: (()+0xb5846) [0x7f1656125846] > 7: (()+0xb5873) [0x7f1656125873] > 8: (()+0xb596e) [0x7f165612596e] > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x282) [0x5dcd52] > 10: (Elector::defer(int)+0x6da) [0x5162aa] > 11: (Elector::handle_propose(MMonElection*)+0x573) [0x516903] > 12: (Elector::dispatch(Message*)+0xc73) [0x519863] > 13: (Monitor::_ms_dispatch(Message*)+0x3f3) [0x4865a3] > 14: (Monitor::ms_dispatch(Message*)+0x32) [0x493752] > 15: (SimpleMessenger::dispatch_entry()+0x863) [0x5bf653] > 16: (SimpleMessenger::DispatchThread::entry()+0xd) [0x589e3d] > 17: (()+0x7e9a) [0x7f16570dce9a] > 18: (clone()+0x6d) [0x7f16558954bd] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. > > --- begin dump of recent events --- > 0> 2012-05-17 08:48:13.882910 7f1652583700 -1 *** Caught signal (Aborted) ** > in thread 7f1652583700 > > ceph version 0.46-219-g54bc094 (commit:54bc09417917d8d0ca99d8ed8285498b7d5aa369) > 1: /usr/bin/ceph-mon() [0x622e9a] > 2: (()+0xfcb0) [0x7f16570e4cb0] > 3: (gsignal()+0x35) [0x7f16557d9445] > 4: (abort()+0x17b) [0x7f16557dcbab] > 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f165612769d] > 6: (()+0xb5846) [0x7f1656125846] > 7: (()+0xb5873) [0x7f1656125873] > 8: (()+0xb596e) [0x7f165612596e] > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x282) [0x5dcd52] > 10: (Elector::defer(int)+0x6da) [0x5162aa] > 11: (Elector::handle_propose(MMonElection*)+0x573) [0x516903] > 12: (Elector::dispatch(Message*)+0xc73) [0x519863] > 13: (Monitor::_ms_dispatch(Message*)+0x3f3) [0x4865a3] > 14: (Monitor::ms_dispatch(Message*)+0x32) [0x493752] > 15: (SimpleMessenger::dispatch_entry()+0x863) [0x5bf653] > 16: (SimpleMessenger::DispatchThread::entry()+0xd) [0x589e3d] > 17: (()+0x7e9a) [0x7f16570dce9a] > 18: (clone()+0x6d) [0x7f16558954bd] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. > > --- end dump of recent events --- -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html