Re: osd out vs crush reweight]

"Marcel Kuiper" <ceph@xxxxxxxx> · Tue, 21 Jul 2020 18:38:13 +0200

Hi Dominic,

This cluster is running 14.2.8 (nautilus)
There's 172 osds divided over 19 nodes.
There are currently 10 pools.
All pools have 3 replica's of data
There are 3968 PG's (the cluster is not yet fully in use. The number of
PGs is expected to grow)

Marcel

> Marcel;
>
> Short answer; yes, it might be expected behavior.
>
> PG placement is highly dependent on the cluster layout, and CRUSH rules.
> So... Some clarifying questions.
>
> What version of Ceph are you running?
> How many nodes do you have?
> How many pools do you have, and what are their failure domains?
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International, Inc.
> DHilsbos@xxxxxxxxxxxxxx
> www.PerformAir.com
>
>
> -----Original Message-----
> From: Marcel Kuiper [mailto:ceph@xxxxxxxx]
> Sent: Tuesday, July 21, 2020 6:52 AM
> To: ceph-users@xxxxxxx
> Subject:  osd out vs crush reweight
>
> Hi list,
>
> I ran a test with marking an osd out versus setting its crush weight to 0.
> I compared to what osds pages were send. The crush map has 3 rooms. This
> is what happened.
>
> On ceph osd out 111 (first room; this node has osds 108 - 116) pg's were
> send to the following osds
>
> NR PG's   OSD
>       2   1
>       1   4
>       1   5
>       1   6
>       1   7
>       2   8
>       1   31
>       1   34
>       1   35
>       1   56
>       2   57
>       1   58
>       1   61
>       1   83
>       1   84
>       1   88
>       1   99
>       1   100
>       2   107
>       1   114
>       2   117
>       1   118
>       1   119
>       1   121
>
> All PG's were send to osds on other nodes in the same room, except for 1
> PG on osd 114. I think this works as expected
>
> Now I  marked the osd in and wait until all stabilized. Then I set the
> crush weight to 0. ceph osd crush reweight osd.111 0. I thought this
> lowers the crush weight of the node so even less chances that PG's end up
> on an osd of the same node. However the result are
>
> NR PG's   OSD
>       1   61
>       1   83
>       1   86
>       3   108
>       4   109
>       5   110
>       2   112
>       5   113
>       7   114
>       5   115
>       2   116
>
> except for 3 PG's all other PG's ended up on an osd belonging to the same
> node :-O. Is this expected behaviour? Can someone explain?? This is on
> nautilus 14.2.8.
>
> Thanks
>
> Marcel
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx