Re: osd heartbeat interval

Sage Weil <sage@xxxxxxxxxxxx> · Tue, 27 Mar 2012 12:33:52 -0700 (PDT)

On Tue, 27 Mar 2012, Cláudio Martins wrote:
> 
>  Hi,
> 
>  While testing a cluster with 47 OSDs, we noticed that with that many
> OSDs there is considerable network traffic (around 2 Mbit/s), most of
> it apparently just from the OSD heartbeats alone (measured while no
> clients were generating I/O). Also, OSD CPU consumption was very
> measurable, constantly around 1~2% on a 3.2GHz Xeon CPU.
> 
>  So we experimented by including
> 
>  osd heartbeat interval = 10
> 
>  on ceph.conf on all nodes and, as suspected, network traffic diminished
> and CPU usage from an idle OSD is not measurable on top anymore.
> 
>  Since there is a considerable number of OSDs in this cluster, we think
> that even with a 10 sec heartbeat, detection of a down OSD by some
> other OSDs is likely to be reasonably quick. As a matter of fact we saw
> on the mon log that, when we stopped an OSD, it was flagged as "failed"
> by other OSDs in just a few seconds.
> 
>  So, we would like to know the opinion of the list about increasing the
> heartbeat interval on large clusters (and perhaps suggesting that on
> the official documentation), namely if you think there might be some
> negative consequences that we haven't foreseen.

A long time ago the heatbeats were also exchanging load information that 
was being used for fine-grained load balancing, but that didn't pan our 
well so it really doesn't need to be as frequent as it is.  Given that the 
default grace is 20 seconds, I'll change it to 6 seconds, so that we'll 
miss a full 3 pings before failing a node.

The one thing to keep in mind is that if the grace period is adjusted, the 
interval probably needs to be as well.

sage