Node down question

Jason <jasons@xxxxxxxxxx> · Mon, 10 Nov 2014 14:21:58 -0800

I have searched the list archives, and have seen a couple of references
to this question, but no real solution, unfortunately...

We are running multiple ceph clusters, pretty much as media appliances. 
As such, the number of nodes is variable, and all of the nodes are
symmetric (i.e. same CPU power, memory, disk space).  As a result, we
are running a monitor and OSD (connected to an SSD RAID) on each of the
systems.  The number of nodes is typically small, on the order of five
to a dozen.  As the node count gets higher, we are planning not to run
monitors on all nodes.

Our pools are typically set up with a replication size of 2 or 3, with a
minsize of 1.

The problem occurs when a single node goes down, such that its monitor
and OSD stop at once.  For a client (especially a writer) on another
node, there is a pretty consistent 20 second delay until further
operations go through.  This is a delay that we cannot easily survive.

If I first bring down the OSD, then wait a few seconds, and then bring
down the monitor, the system behaves with only a few seconds of delay. 
However, we can't always guarantee the graceful shutdown (such as when a
node is rebooted, loses network connectivity, or power is lost).

Note that I get exactly the same behavior if I stop an OSD on one
system, while stopping a monitor on another...

Previous discussions similar to this have touched upon the "osd
heartbeat grace" setting, which is conspiciously set to 20 seconds.  I
have tried changing this, along with any other related settings, to no
avail -- for whatever I do, the delay remains at 20 seconds.

Anything else to try?

Jason

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com