I’d say the cure is worse than the issue you’re trying to fix, but that’s my two cents. Mark Schouten > Op 24 jul. 2019 om 21:22 heeft Wido den Hollander <wido@xxxxxxxx> het volgende geschreven: > > Hi, > > Is anybody using 4x (size=4, min_size=2) replication with Ceph? > > The reason I'm asking is that a customer of mine asked me for a solution > to prevent a situation which occurred: > > A cluster running with size=3 and replication over different racks was > being upgraded from 13.2.5 to 13.2.6. > > During the upgrade, which involved patching the OS as well, they > rebooted one of the nodes. During that reboot suddenly a node in a > different rack rebooted. It was unclear why this happened, but the node > was gone. > > While the upgraded node was rebooting and the other node crashed about > 120 PGs were inactive due to min_size=2 > > Waiting for the nodes to come back, recovery to finish it took about 15 > minutes before all VMs running inside OpenStack were back again. > > As you are upgraded or performing any maintenance with size=3 you can't > tolerate a failure of a node as that will cause PGs to go inactive. > > This made me think about using size=4 and min_size=2 to prevent this > situation. > > This obviously has implications on write latency and cost, but it would > prevent such a situation. > > Is anybody here running a Ceph cluster with size=4 and min_size=2 for > this reason? > > Thank you, > > Wido > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com