On 06/19/2013 10:53 AM, James Harper wrote:
Every time I start up one of my mons it crashes. Two others are running but there seems to be long delays (=several seconds) when doing mon status (maybe this is the behaviour when one mon is down?)
The tail of /var/log/ceph/ceph-mon.4.log follows this email.
Version is 0.61.3-1~bpo70+1 from http://ceph.com/debian-cuttlefish wheezy main
This was happening in a previous version, and then even before that but I thought I'd fixed it by wiping the errant mon and recreating it.
Anything else I can supply that might help?
Thanks
James
0> 2013-06-19 19:45:44.018695 7f472d995700 -1 mon/Monitor.cc: In function 'void Monitor::sync_timeout(entity_inst_t&)' thread 7f472d995700 time 2013-06-19 19:45:44.017928
mon/Monitor.cc: 1101: FAILED assert(sync_state == SYNC_STATE_CHUNKS)
ceph version 0.61.3 (92b1e398576d55df8e5888dd1a9545ed3fd99532)
1: /usr/bin/ceph-mon() [0x4c8eca]
2: (Context::complete(int)+0xa) [0x4d70fa]
3: (SafeTimer::timer_thread()+0x1af) [0x64ad4f]
4: (SafeTimerThread::entry()+0xd) [0x64c3dd]
5: (()+0x6b50) [0x7f47c0c3ab50]
6: (clone()+0x6d) [0x7f47bf39ba7d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Issues on sync_timeout() have been seen, I track them down for some
time, find nothing of worth and logs usually don't help that much, and I
eventually have to move on.
http://tracker.ceph.com/issues/4845
and
http://tracker.ceph.com/issues/5171
contain two iterations of what appears to be the same bug. My guess is
that there's a lingering Context not being cancelled somewhere. Or it
might be some other thing altogether.
James, do you happen to have a full log you can share with us?
-Joao
--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html