Den tors 5 dec. 2019 kl 00:28 skrev Milan Kupcevic <milan_kupcevic@xxxxxxxxxxx>:
There is plenty of space to take more than a few failed nodes. But the
question was about what is going on inside a node with a few failed
drives. Current Ceph behavior keeps increasing number of placement
groups on surviving drives inside the same node. It does not spread them
across the cluster. So, lets get back to he original question. Shall
host weight auto reduce on hdd failure, or not?
If the OSDs are still in the crush map, with non-zero weights, they will add "value" to the host, and hence the host gets as much PGs as the sum of the crush values (ie, sizes) says it can bear.
If some of the OSDs have zero OSD-reweight values, they will not take a part of the burden, but rather let the "surviving" OSDs on the host take more load, until the cluster decides the broken OSDs are down and out, at which point the cluster rebalances according to the general algorithm which should(*) even it out, letting the OSD hosts with fewer OSDs have less PGs and hence less data.
*) There are reports of Nautilus (only, as far as I remember) having weird placement ideas that tend to fill up OSDs that already have much data, leaving it to the ceph admin to force values down in order to
not go over 85% at which point some rebalancing ops will stop.
May the most significant bit of your life be positive.
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com