On 02.03.23 09:16, stefan.pinter@xxxxxxxxxxxxxxxx wrote:
so if one room goes down/offline, around 50% of the PGs would be left with only 1 replica making them read-only.
Most people forget the other half of the cluster in such a scenario.
For us humans it is obvious that one room is down, because we can see it
from the outside.
The OSDs only see that they do not have connectivity to their peering
partners. They do not know if this is because the other hosts are down
or just the network in between.
It could be the case that just the line between both rooms is dead and
then you have 2 copies running in one room and only one in the other.
If you now allow changes in the "smaller" room in addition to changes in
the room with two copies you immediately get a conflict as soon as the
network connection between both rooms is reestablished.
This is why min_size=1 is a really bad idea outside of a desaster
scenario where the other two copies are completely lost to a fire.
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin
Tel: 030 / 405051-43
Fax: 030 / 405051-19
Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx