Anybody using 4x (size=4) replication?

Wido den Hollander <wido@xxxxxxxx> · Wed, 24 Jul 2019 20:49:32 +0200

Hi,

Is anybody using 4x (size=4, min_size=2) replication with Ceph?

The reason I'm asking is that a customer of mine asked me for a solution
to prevent a situation which occurred:

A cluster running with size=3 and replication over different racks was
being upgraded from 13.2.5 to 13.2.6.

During the upgrade, which involved patching the OS as well, they
rebooted one of the nodes. During that reboot suddenly a node in a
different rack rebooted. It was unclear why this happened, but the node
was gone.

While the upgraded node was rebooting and the other node crashed about
120 PGs were inactive due to min_size=2

Waiting for the nodes to come back, recovery to finish it took about 15
minutes before all VMs running inside OpenStack were back again.

As you are upgraded or performing any maintenance with size=3 you can't
tolerate a failure of a node as that will cause PGs to go inactive.

This made me think about using size=4 and min_size=2 to prevent this
situation.

This obviously has implications on write latency and cost, but it would
prevent such a situation.

Is anybody here running a Ceph cluster with size=4 and min_size=2 for
this reason?

Thank you,

Wido
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com