Re: One pg stuck in active+undersized+degraded after OSD down

David Tinker <david.tinker@xxxxxxxxx> · Mon, 22 Nov 2021 15:50:48 +0200

I just had a look at the balance docs and it says "No adjustments will be
made to the PG distribution if the cluster is degraded (e.g., because an
OSD has failed and the system has not yet healed itself).". That implies
that the balancer won't run until the disruption caused by the removed OSD
has been sorted out?

On Mon, Nov 22, 2021 at 3:12 PM David Tinker <david.tinker@xxxxxxxxx> wrote:

> Yes it is on:
>
> # ceph balancer status
> {
>     "active": true,
>     "last_optimize_duration": "0:00:00.001867",
>     "last_optimize_started": "Mon Nov 22 13:10:24 2021",
>     "mode": "upmap",
>     "optimize_result": "Unable to find further optimization, or pool(s)
> pg_num is decreasing, or distribution is already perfect",
>     "plans": []
> }
>
> On Mon, Nov 22, 2021 at 10:17 AM Stefan Kooman <stefan@xxxxxx> wrote:
>
>> On 11/22/21 08:12, David Tinker wrote:
>> > I set osd.7 as "in", uncordened the node, scaled the OSD deployment
>> back
>> > up and things are recovering with cluster status HEALTH_OK.
>> >
>> > I found this message from the archives:
>> > https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg47071.html
>> > <https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg47071.html>
>> >
>> > "You have a large difference in the capacities of the nodes. This
>> > resultsin a different host weight, which in turn might lead to problems
>> > withthe crush algorithm. It is not able to get three different hosts
>> for
>> > OSDplacement for some of the PGs.
>> >
>> > CEPH and crush do not cope well with heterogenous setups. I
>> wouldsuggest
>> > to move one of the OSDs from host ceph1 to ceph4 to equalize thehost
>> > weight."
>> >
>> > My nodes do have very different weights. What I am trying to do is
>> > re-install each node in the cluster so they all have the same amount of
>> > space for Ceph (much less than before .. we need more space for
>> hostpath
>> > stuff).
>> >
>> > # ceph osd tree
>> > ID   CLASS  WEIGHT    TYPE NAME                               STATUS
>> REWEIGHT  PRI-AFF
>> >   -1         13.77573  root default
>> >   -5         13.77573      region FSN1
>> > -22          0.73419          zone FSN1-DC13
>> > -21                0              host node5-redacted-com
>> > -27          0.73419              host node7-redacted-com
>> >    1    ssd   0.36710                  osd.1                       up
>>  1.00000  1.00000
>> >    5    ssd   0.36710                  osd.5                       up
>>  1.00000  1.00000
>> > -10          6.20297          zone FSN1-DC14
>> >   -9          6.20297              host node3-redacted-com
>> >    2    ssd*3.10149*                   osd.2                       up
>>  1.00000  1.00000
>> >    4    ssd*3.10149*                   osd.4                       up
>>  1.00000  1.00000
>> > -18          3.19919          zone FSN1-DC15
>> > -17*3.19919*               host node4-redacted-com
>> >    7    ssd*3.19919*                   osd.7                     down
>>        0  1.00000
>> >   -4          2.90518          zone FSN1-DC16
>> >   -3          2.90518              host node1-redacted-com
>> >    0    ssd*1.45259*                   osd.0                       up
>>  1.00000  1.00000
>> >    3    ssd*1.45259*                   osd.3                       up
>>  1.00000  1.00000
>> > -14          0.73419          zone FSN1-DC18
>> > -13                0              host node2-redacted-com
>> > -25          0.73419              host node6-redacted-com
>> >   10    ssd   0.36710                  osd.10                      up
>>  1.00000  1.00000
>> >   11    ssd   0.36710                  osd.11                      up
>>  1.00000  1.00000
>> >
>> >
>> > Should I just change the weights before/after removing OSD 7?
>> >
>> > With something like "ceph osd crush reweight osd.7 1.0"?
>>
>> The ceph balancer is there to balance PGs across all nodes. Do you have
>> it enabled?
>>
>> ceph balancer status
>>
>> The most efficient way is to use mode upmap (should work with modern
>> clients):
>>
>> ceph balancer mode upmap
>>
>> Gr. Stefn
>>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx