Re: OSDs down after reweight

Etienne Menguy <etienne.menguy@xxxxxxxxxxx> · Tue, 15 Nov 2022 09:45:19 +0000

Hi,
You probably caused a large rebalance and overload your slow HDD. But all OSD are up in what you are sharing.

Also, I see you changed weight and reweight values, that's what you wanted? 

Étienne

> -----Original Message-----
> From: Frank Schilder <frans@xxxxxx>
> Sent: mardi 15 novembre 2022 10:38
> To: ceph-users@xxxxxxx
> Subject:  Re: OSDs down after reweight
> 
> Here is how this looks on a test cluster:
> 
> # ceph osd tree
> ID  CLASS  WEIGHT   TYPE NAME          STATUS  REWEIGHT  PRI-AFF
> -1         2.44707  root default
> -3         0.81569      host tceph-01
>  0    hdd  0.27190          osd.0          up   1.00000  1.00000
>  2    hdd  0.27190          osd.2          up   1.00000  1.00000
>  4    hdd  0.27190          osd.4          up   1.00000  1.00000
> -7         0.81569      host tceph-02
>  6    hdd  0.27190          osd.6          up   1.00000  1.00000
>  7    hdd  0.27190          osd.7          up   1.00000  1.00000
>  8    hdd  0.27190          osd.8          up   1.00000  1.00000
> -5         0.81569      host tceph-03
>  1    hdd  0.27190          osd.1          up   1.00000  1.00000
>  3    hdd  0.27190          osd.3          up   1.00000  1.00000
>  5    hdd  0.27190          osd.5          up   1.00000  1.00000
> 
> # ceph pg dump pgs_brief | head -8
> PG_STAT  STATE                   UP             UP_PRIMARY  ACTING
> ACTING_PRIMARY
> 3.7e               active+clean  [6,0,2,5,3,7]           6  [6,0,2,5,3,7]               6
> 2.7f               active+clean        [7,5,2]           7        [7,5,2]               7
> 2.7e               active+clean        [0,1,8]           0        [0,1,8]               0
> 3.7c               active+clean  [6,5,0,7,2,8]           6  [6,5,0,7,2,8]               6
> 2.7d               active+clean        [0,8,3]           0        [0,8,3]               0
> 3.7d               active+clean  [7,0,3,8,1,2]           7  [7,0,3,8,1,2]               7
> 
> 
> After osd reweight to 0.5:
> 
> # ceph osd tree
> ID  CLASS  WEIGHT   TYPE NAME          STATUS  REWEIGHT  PRI-AFF
> -1         2.44707  root default
> -3         0.81569      host tceph-01
>  0    hdd  0.27190          osd.0          up   0.50000  1.00000
>  2    hdd  0.27190          osd.2          up   0.50000  1.00000
>  4    hdd  0.27190          osd.4          up   0.50000  1.00000
> -7         0.81569      host tceph-02
>  6    hdd  0.27190          osd.6          up   0.50000  1.00000
>  7    hdd  0.27190          osd.7          up   0.50000  1.00000
>  8    hdd  0.27190          osd.8          up   0.50000  1.00000
> -5         0.81569      host tceph-03
>  1    hdd  0.27190          osd.1          up   0.50000  1.00000
>  3    hdd  0.27190          osd.3          up   0.50000  1.00000
>  5    hdd  0.27190          osd.5          up   0.50000  1.00000
> 
> # ceph pg dump pgs_brief | head -8
> PG_STAT  STATE                          UP                                        UP_PRIMARY  ACTING
> ACTING_PRIMARY
> 3.7e     active+remapped+backfill_wait                    [6,0,4,5,1,2147483647]           6
> [6,0,2,5,3,7]               6
> 2.7f                      active+clean                                   [7,5,2]           7        [7,5,2]               7
> 3.7f     active+remapped+backfill_wait           [1,2147483647,7,8,2147483647,2]
> 1  [0,5,4,8,6,2]               0
> 2.7e     active+remapped+backfill_wait                                   [5,4,8]           5
> [1,8,0]               1
> 3.7c     active+remapped+backfill_wait
> [2147483647,1,0,2147483647,2147483647,8]           1  [6,5,0,7,2,8]               6
> 2.7d     active+remapped+backfill_wait                                   [0,3,6]           0
> [0,3,8]               0
> 3.7d     active+remapped+backfill_wait                    [2147483647,0,3,6,4,2]           0
> [7,0,3,8,1,2]               7
> 
> 
> After osd crush reweight to 0.5*0.27190:
> 
> # ceph osd tree
> ID  CLASS  WEIGHT   TYPE NAME          STATUS  REWEIGHT  PRI-AFF
> -1         1.22346  root default
> -3         0.40782      host tceph-01
>  0    hdd  0.13594          osd.0          up   1.00000  1.00000
>  2    hdd  0.13594          osd.2          up   1.00000  1.00000
>  4    hdd  0.13594          osd.4          up   1.00000  1.00000
> -7         0.40782      host tceph-02
>  6    hdd  0.13594          osd.6          up   1.00000  1.00000
>  7    hdd  0.13594          osd.7          up   1.00000  1.00000
>  8    hdd  0.13594          osd.8          up   1.00000  1.00000
> -5         0.40782      host tceph-03
>  1    hdd  0.13594          osd.1          up   1.00000  1.00000
>  3    hdd  0.13594          osd.3          up   1.00000  1.00000
>  5    hdd  0.13594          osd.5          up   1.00000  1.00000
> 
> # ceph pg dump pgs_brief | head -8
> PG_STAT  STATE                        UP             UP_PRIMARY  ACTING
> ACTING_PRIMARY
> 3.7e                    active+clean  [6,0,2,5,3,7]           6  [6,0,2,5,3,7]               6
> 2.7f                    active+clean        [7,5,2]           7        [7,5,2]               7
> 3.7f                    active+clean  [0,5,4,8,6,2]           0  [0,5,4,8,6,2]               0
> 2.7e                    active+clean        [0,1,8]           0        [0,1,8]               0
> 3.7c                    active+clean  [6,5,0,7,2,8]           6  [6,5,0,7,2,8]               6
> 2.7d                    active+clean        [0,8,3]           0        [0,8,3]               0
> 3.7d                    active+clean  [7,0,3,8,1,2]           7  [7,0,3,8,1,2]               7
> 
> 
> According to the documentation, I would expect identical mappings in all 3
> cases. Can someone help me out here?
> 
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> 
> ________________________________________
> From: Frank Schilder <frans@xxxxxx>
> Sent: 15 November 2022 10:09:10
> To: ceph-users@xxxxxxx
> Subject:  OSDs down after reweight
> 
> Hi all,
> 
> I re-weighted all OSDs in a pool down from 1.0 to the same value 0.052 (see
> reason below). After this, all hell broke loose. OSDs were marked down, slow
> OPS all over the place and the MDSes started complaining about slow
> ops/requests. Basically all PGs were remapped. After setting all re-weights
> back to 1.0 the situation went back to normal.
> 
> Expected behaviour: No (!!!) PGs are remapped and everything continues to
> work. Why did things go down?
> 
> More details: We have 24 OSDs with weight=1.74699 in a pool. I wanted to
> add OSDs with weight=0.09099 in such a way that the small OSDs receive
> approximately the same number of PGs as the large ones. Setting a re-
> weight factor of 0.052 for the large ones should achieve just that:
> 1.74699*0.05=0.09084. So, procedure was:
> 
> - ceph osd crush reweight osd.N 0.052 for all OSDs in that pool
> - add the small disks and re-balance
> 
> I would expect that the crush mapping is invariant under a uniform change of
> weight. That is, if I apply the same relative weight-change to all OSDs
> (new_weight=old_weight*common_factor) in a pool, the mappings should
> be preserved. However, this is not what I observed. How is it possible that
> PG mappings change if the relative weight of all OSDs to each other stays the
> same (the probabilities of picking an OSD are unchanged over all OSDs)?
> 
> Thanks for any hints.
> 
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email
> to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email
> to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx