Hi There,
We hit a monitor crash bug in our production clusters during adding more nodes into one of clusters.
The stack trace looks like below:
lc 25431444 0> 2017-11-23 15:41:16.688046 7f93883f2700 -1 error_msg mon/OSDMonitor.cc: In function 'MOSDMap* OSDMonitor::build_incremental(epoch_t, epoch_t)' thread 7f93883f2700 time 2017-11-23 15:41:16.683525
mon/OSDMonitor.cc: 2123: FAILED assert(0)
ceph version .94.5.9 (e92a4716ae7404566753964959ddd84411b5dd18)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7b4735]
2: (OSDMonitor::build_incremental(unsigned int, unsigned int)+0x9ab) [0x5e2e5b]
3: (OSDMonitor::send_incremental(unsigned int, MonSession*, bool)+0xb1) [0x5e85b1]
4: (OSDMonitor::check_sub(Subscription*)+0x217) [0x5e8c17]
5: (Monitor::handle_subscribe(MMonSubscribe*)+0x440) [0x571810]
6: (Monitor::dispatch(MonSession*, Message*, bool)+0x3eb) [0x592d5b]
7: (Monitor::_ms_dispatch(Message*)+0x1a6) [0x593716]
8: (Monitor::ms_dispatch(Message*)+0x23) [0x5b2ac3]
9: (DispatchQueue::entry()+0x62a) [0x8a44aa]
10: (DispatchQueue::DispatchThread::entry()+0xd) [0x79c97d]
11: (()+0x7dc5) [0x7f93ad51ddc5]
mon/OSDMonitor.cc: 2123: FAILED assert(0)
ceph version .94.5.9 (e92a4716ae7404566753964959ddd84411b5dd18)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7b4735]
2: (OSDMonitor::build_incremental(unsigned int, unsigned int)+0x9ab) [0x5e2e5b]
3: (OSDMonitor::send_incremental(unsigned int, MonSession*, bool)+0xb1) [0x5e85b1]
4: (OSDMonitor::check_sub(Subscription*)+0x217) [0x5e8c17]
5: (Monitor::handle_subscribe(MMonSubscribe*)+0x440) [0x571810]
6: (Monitor::dispatch(MonSession*, Message*, bool)+0x3eb) [0x592d5b]
7: (Monitor::_ms_dispatch(Message*)+0x1a6) [0x593716]
8: (Monitor::ms_dispatch(Message*)+0x23) [0x5b2ac3]
9: (DispatchQueue::entry()+0x62a) [0x8a44aa]
10: (DispatchQueue::DispatchThread::entry()+0xd) [0x79c97d]
11: (()+0x7dc5) [0x7f93ad51ddc5]
12: (clone()+0x6d) [0x7f93ac00176d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
the exact assert failure is:
MOSDMap *OSDMonitor::build_incremental(epoch_t from, epoch_t to)
{
dout(10) << "build_incremental [" << from << ".." << to << "]" << dendl;
MOSDMap *m = new MOSDMap(mon->monmap->fsid);
m->oldest_map = get_first_committed();
m->newest_map = osdmap.get_epoch();
for (epoch_t e = to; e >= from && e > 0; e--) {
bufferlist bl;
int err = get_version(e, bl);
if (err == 0) {
assert(bl.length());
// if (get_version(e, bl) > 0) {
dout(20) << "build_incremental inc " << e << " "
<< bl.length() << " bytes" << dendl;
m->incremental_maps[e] = bl;
} else {
assert(err == -ENOENT);
assert(!bl.length());
get_version_full(e, bl);
if (bl.length() > 0) {
//else if (get_version("full", e, bl) > 0) {
dout(20) << "build_incremental full " << e << " "
<< bl.length() << " bytes" << dendl;
m->maps[e] = bl;
} else {
assert(0); // we should have all maps.
<=======assert failed
}
}
}
return m;
}
we checked the code and found there could be race condition between mondbstore read operation and osdmap trim operation. The panic scenario looks like mondbstore is trimming osdmap and concurrently, new added osd is requesting osdmap which invoked OSDMonitor::build_incremental(). if the requested map is trimmed, get_version_full can not get the osdmaps from mondbstore, then the assert failure is triggered. Though we run into this issue with hammer, we checked the latest master branch and believe the race condition is still there. Can anyone confirm this?
BTW, we think this is a dup of http://tracker.ceph.com/issues/11332 and updated the comments but no response by now.
zhongyan
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com