Re: Shall host weight auto reduce on hdd failure?

Milan Kupcevic <milan_kupcevic@xxxxxxxxxxx> · Thu, 5 Dec 2019 09:40:38 -0500

On 2019-12-05 02:33, Janne Johansson wrote:
> Den tors 5 dec. 2019 kl 00:28 skrev Milan Kupcevic
> <milan_kupcevic@xxxxxxxxxxx <mailto:milan_kupcevic@xxxxxxxxxxx>>:
> 
> 
> 
>     There is plenty of space to take more than a few failed nodes. But the
>     question was about what is going on inside a node with a few failed
>     drives. Current Ceph behavior keeps increasing number of placement
>     groups on surviving drives inside the same node. It does not spread them
>     across the cluster. So, lets get back to he original question. Shall
>     host weight auto reduce on hdd failure, or not?
> 
> 
> If the OSDs are still in the crush map, with non-zero weights, they will
> add "value" to the host, and hence the host gets as much PGs as the sum
> of the crush values (ie, sizes) says it can bear.
> If some of the OSDs have zero OSD-reweight values, they will not take a
> part of the burden, but rather let the "surviving" OSDs on the host take
> more load, until the cluster decides the broken OSDs are down and out,
> at which point the cluster rebalances according to the general algorithm
> which should(*) even it out, letting the OSD hosts with fewer OSDs have
> less PGs and hence less data.
> 

Well, that is simply not happening.

See the state of WEIGHT and REWEIGHT columns in this sample of four
nodes which are a part of a huge cluster:

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-December/037602.html

The failed osds are definitely down and out for significant period of
time. Also compare numbers of placement groups (PGS) per osd on all
presented nodes.

Milan

-- 
Milan Kupcevic
Senior Cyberinfrastructure Engineer at Project NESE
Harvard University
FAS Research Computing
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com