Re: One pg stuck in active+undersized+degraded after OSD down

David Tinker <david.tinker@xxxxxxxxx> · Wed, 24 Nov 2021 09:58:58 +0200

Fiddling with the crush weights sorted this out and I was able to remove
the OSD from the cluster. I set all the big weights down to 1

ceph osd crush reweight osd.7 1.0
etc.

Tx for all the help

On Tue, Nov 23, 2021 at 9:35 AM Stefan Kooman <stefan@xxxxxx> wrote:

> On 11/23/21 08:21, David Tinker wrote:
> > Yes it recovered when I put the OSD back in. The issue is that it fails
> > to sort itself out when I remove that OSD even though I have loads of
> > space and 8 other OSDs in 4 different zones to choose from. The weights
> > are very different (some 3.2 others 0.36) and that post I found
> > suggested that this might cause trouble for the crush algorithm. So I
> > was thinking about making them more even before removing the OSD.
>
> Sure, that sounds like a good plan. If the balancer is not able to
> optimize at this point, the improved balancer bij Jonas Jelten might do
> the trick here [1], also see ML thread [2].
>
> Do you have many different pools?
>
> What Weiwen Hu said in reply to this threads sounds very plausible. You
> might dump your crush map, remove OSD.7, and check if the mappings can
> still be made.
>
> Gr. Stefan
>
> [1]: https://github.com/TheJJ/ceph-balancer
> [2]:
>
> https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/QXTFTYRD43XB3JRDSTMH657ISSMZ6QTU/
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx