On 2019-12-05 02:33, Janne Johansson wrote: > Den tors 5 dec. 2019 kl 00:28 skrev Milan Kupcevic > <milan_kupcevic@xxxxxxxxxxx <mailto:milan_kupcevic@xxxxxxxxxxx>>: > > > > There is plenty of space to take more than a few failed nodes. But the > question was about what is going on inside a node with a few failed > drives. Current Ceph behavior keeps increasing number of placement > groups on surviving drives inside the same node. It does not spread them > across the cluster. So, lets get back to he original question. Shall > host weight auto reduce on hdd failure, or not? > > > If the OSDs are still in the crush map, with non-zero weights, they will > add "value" to the host, and hence the host gets as much PGs as the sum > of the crush values (ie, sizes) says it can bear. > If some of the OSDs have zero OSD-reweight values, they will not take a > part of the burden, but rather let the "surviving" OSDs on the host take > more load, until the cluster decides the broken OSDs are down and out, > at which point the cluster rebalances according to the general algorithm > which should(*) even it out, letting the OSD hosts with fewer OSDs have > less PGs and hence less data. > Well, that is simply not happening. See the state of WEIGHT and REWEIGHT columns in this sample of four nodes which are a part of a huge cluster: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-December/037602.html The failed osds are definitely down and out for significant period of time. Also compare numbers of placement groups (PGS) per osd on all presented nodes. Milan -- Milan Kupcevic Senior Cyberinfrastructure Engineer at Project NESE Harvard University FAS Research Computing _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com