I have searched the list archives, and have seen a couple of references to this question, but no real solution, unfortunately... We are running multiple ceph clusters, pretty much as media appliances. As such, the number of nodes is variable, and all of the nodes are symmetric (i.e. same CPU power, memory, disk space). As a result, we are running a monitor and OSD (connected to an SSD RAID) on each of the systems. The number of nodes is typically small, on the order of five to a dozen. As the node count gets higher, we are planning not to run monitors on all nodes. Our pools are typically set up with a replication size of 2 or 3, with a minsize of 1. The problem occurs when a single node goes down, such that its monitor and OSD stop at once. For a client (especially a writer) on another node, there is a pretty consistent 20 second delay until further operations go through. This is a delay that we cannot easily survive. If I first bring down the OSD, then wait a few seconds, and then bring down the monitor, the system behaves with only a few seconds of delay. However, we can't always guarantee the graceful shutdown (such as when a node is rebooted, loses network connectivity, or power is lost). Note that I get exactly the same behavior if I stop an OSD on one system, while stopping a monitor on another... Previous discussions similar to this have touched upon the "osd heartbeat grace" setting, which is conspiciously set to 20 seconds. I have tried changing this, along with any other related settings, to no avail -- for whatever I do, the delay remains at 20 seconds. Anything else to try? Jason _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com