heartbeat logic

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




During startup of an osd cluster with 37 osds, within the first few seconds I see osds getting marked down, even though the osd processes remain running and seem to be just fine. The up count fluctuates for a while but seems to stabilize eventually at around 30 up osds, while 7 or so remain down, and eventually get marked out.

With debugging enabled, I've tracked it down to this bit of logic in OSD.cc:1502 (stable branch):

------snip------
  // ignore (and mark down connection for) old messages
  epoch_t e = m->map_epoch;
  if (!e)
    e = m->peer_as_of_epoch;
  if (e <= osdmap->get_epoch() &&
((heartbeat_to.count(from) == 0 && heartbeat_from.count(from) == 0) ||
       heartbeat_con[from] != m->get_connection())) {
    dout(5) << "handle_osd_ping marking down peer " << m->get_source_inst()
<< " after old message from epoch " << e
<< " <= current " << osdmap->get_epoch() << dendl;
    heartbeat_messenger->mark_down(m->get_connection());
    goto out;
  }
--------------------

It looks as though the osd getting marked down is sending a heartbeat ping to another osd, at which point, that osd marks it as down. Its not clear to me why that happens. Is it because connections are getting dropped and ports are changing?

In any case, that if conditional succeeds, resulting in the osd marking down the osd that just sent it a ping heartbeat.

I modified the debug output to show the values for heartbeat_to.count(from) and heartbeat_from.count(from), as well as heartbeat_con[from] and m->get_connection(). The cases where osds are marked down are when the ping message's epoch and the osdmap epoch are the same (usually around 16), and the counts are always zero, suggesting that this is the first heartbeat from osdA to osdB. Even if they weren't zero, the heartbeat_con[from] is null, and doesn't get set till later, so the conditional would succeed anyway. Can someone explain the purpose and reasoning behind this bit of code? If I just whack the second part of the conditional will bad things happen? Any help is greatly appreciated.

Thanks,
-sam

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux