Hi, While testing a cluster with 47 OSDs, we noticed that with that many OSDs there is considerable network traffic (around 2 Mbit/s), most of it apparently just from the OSD heartbeats alone (measured while no clients were generating I/O). Also, OSD CPU consumption was very measurable, constantly around 1~2% on a 3.2GHz Xeon CPU. So we experimented by including osd heartbeat interval = 10 on ceph.conf on all nodes and, as suspected, network traffic diminished and CPU usage from an idle OSD is not measurable on top anymore. Since there is a considerable number of OSDs in this cluster, we think that even with a 10 sec heartbeat, detection of a down OSD by some other OSDs is likely to be reasonably quick. As a matter of fact we saw on the mon log that, when we stopped an OSD, it was flagged as "failed" by other OSDs in just a few seconds. So, we would like to know the opinion of the list about increasing the heartbeat interval on large clusters (and perhaps suggesting that on the official documentation), namely if you think there might be some negative consequences that we haven't foreseen. Thanks in advance Best regards Cláudio -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html