monitor crash issue

Zhongyan Gu <zhongyan.gu@xxxxxxxxx> · Tue, 28 Nov 2017 22:25:59 +0800

Hi There,
We hit a monitor crash bug in our production clusters during adding more nodes into one of  clusters.
The stack trace looks like below:

lc 25431444     0> 2017-11-23 15:41:16.688046 7f93883f2700 -1 error_msg mon/OSDMonitor.cc: In function 'MOSDMap* OSDMonitor::build_incremental(epoch_t, epoch_t)' thread 7f93883f2700 time 2017-11-23 15:41:16.683525

mon/OSDMonitor.cc: 2123: FAILED assert(0)

ceph version .94.5.9 (e92a4716ae7404566753964959ddd84411b5dd18)

1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7b4735]

2: (OSDMonitor::build_incremental(unsigned int, unsigned int)+0x9ab) [0x5e2e5b]

3: (OSDMonitor::send_incremental(unsigned int, MonSession*, bool)+0xb1) [0x5e85b1]

4: (OSDMonitor::check_sub(Subscription*)+0x217) [0x5e8c17]

5: (Monitor::handle_subscribe(MMonSubscribe*)+0x440) [0x571810]

6: (Monitor::dispatch(MonSession*, Message*, bool)+0x3eb) [0x592d5b]

7: (Monitor::_ms_dispatch(Message*)+0x1a6) [0x593716]

8: (Monitor::ms_dispatch(Message*)+0x23) [0x5b2ac3]

9: (DispatchQueue::entry()+0x62a) [0x8a44aa]

10: (DispatchQueue::DispatchThread::entry()+0xd) [0x79c97d]

11: (()+0x7dc5) [0x7f93ad51ddc5]

12: (clone()+0x6d) [0x7f93ac00176d]

NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

the exact assert failure is:
MOSDMap *OSDMonitor::build_incremental(epoch_t from, epoch_t to)
{
  dout(10) << "build_incremental [" << from << ".." << to << "]" << dendl;
  MOSDMap *m = new MOSDMap(mon->monmap->fsid);
  m->oldest_map = get_first_committed();
  m->newest_map = osdmap.get_epoch();

  for (epoch_t e = to; e >= from && e > 0; e--) {
    bufferlist bl;
    int err = get_version(e, bl);
    if (err == 0) {
      assert(bl.length());
      // if (get_version(e, bl) > 0) {
      dout(20) << "build_incremental    inc " << e << " "
	       << bl.length() << " bytes" << dendl;
      m->incremental_maps[e] = bl;
    } else {
      assert(err == -ENOENT);
      assert(!bl.length());
      get_version_full(e, bl);
      if (bl.length() > 0) {
      //else if (get_version("full", e, bl) > 0) {
      dout(20) << "build_incremental   full " << e << " "
	       << bl.length() << " bytes" << dendl;
      m->maps[e] = bl;
      } else {
	assert(0);  // we should have all maps.
  <=======assert failed 
      }
    }
  }
  return m;
}

we checked the code and found there could be race condition between mondbstore read operation and osdmap trim operation. The panic scenario looks like mondbstore is trimming osdmap and concurrently, new added osd is requesting osdmap which invoked OSDMonitor::build_incremental(). if the requested map is trimmed, get_version_full can not get the osdmaps from mondbstore, then the assert failure is triggered. Though we run into this issue with hammer, we checked the latest master branch and believe the race condition is still there. Can anyone confirm this? 

BTW, we think this is a dup of http://tracker.ceph.com/issues/11332 and updated the comments but no response by now.

zhongyan

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com