Re: another assertion failure in monitor

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 11, 2014 at 9:15 AM, Joao Eduardo Luis <joao.luis@xxxxxxxxxxx> wrote:
On 03/10/2014 10:30 PM, Pawel Veselov wrote:

Now, I'm getting this. May be any idea what can be done to straighten
this up?

This is weird.  Can you please share the steps taken until this was triggered, as well as the rest of the log?

At this point, no, sorry.

This whole thing started with migrating from 0.56.7 to 0.72.2. First, we started seeing failed assertions of (version == pg_map.version) in PGMonitor.cc:273, but on one monitor (d) only. I attempted to resync the failing monitor (d) with --force-sync from (c). (d) started to work, but (c) started to fail with (version==pg_map.version) assertion. So, I tried re-syncing (c) from (d) with --force-resync. That's when (c) started to fail with this particular (ret==0) assertion. I don't really think that resyncing actually worked any at that point.

I didn't find a way to fix this quickly enough, so I restored the mon directories from back up, and started again. The (version == pg_map.version) came back, but my back-up was taken before I was trying to do force-resync, but not before the migration started (that was stupid of me to not have backed up before migration). (That's the point when I tried all kindsa crazy stuff for a while).

After some poking around, what I ended up doing is plain removing 'store.db' directory from the monitor fs, and starting the monitors. That just re-initiated the migration, and this time it was done in the absence of client requests, and one monitor at a time.



      0> 2014-03-10 22:26:23.757166 7fc0397e5700 -1 mon/AuthMonitor.cc:
In function 'virtual void AuthMonitor::create_initial()' thread
7fc0397e5700 time 2014-03-10 22:26:23.755442
mon/AuthMonitor.cc: 101: FAILED assert(ret == 0)

  ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)
  1: (AuthMonitor::create_initial()+0x4d8) [0x637bb8]
  2: (PaxosService::_active()+0x51b) [0x594fcb]
  3: (Context::complete(int)+0x9) [0x565499]
  4: (finish_contexts(CephContext*, std::list<Context*,
std::allocator<Context*> >&, int)+0x95) [0x5698b5]
  5: (Paxos::handle_accept(MMonPaxos*)+0x885) [0x589595]
  6: (Paxos::dispatch(PaxosServiceMessage*)+0x28b) [0x58d66b]
  7: (Monitor::dispatch(MonSession*, Message*, bool)+0x4f0) [0x563620]
  8: (Monitor::_ms_dispatch(Message*)+0x1fb) [0x5639fb]
  9: (Monitor::ms_dispatch(Message*)+0x32) [0x57f212]

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux