Here is how this looks on a test cluster: # ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 2.44707 root default -3 0.81569 host tceph-01 0 hdd 0.27190 osd.0 up 1.00000 1.00000 2 hdd 0.27190 osd.2 up 1.00000 1.00000 4 hdd 0.27190 osd.4 up 1.00000 1.00000 -7 0.81569 host tceph-02 6 hdd 0.27190 osd.6 up 1.00000 1.00000 7 hdd 0.27190 osd.7 up 1.00000 1.00000 8 hdd 0.27190 osd.8 up 1.00000 1.00000 -5 0.81569 host tceph-03 1 hdd 0.27190 osd.1 up 1.00000 1.00000 3 hdd 0.27190 osd.3 up 1.00000 1.00000 5 hdd 0.27190 osd.5 up 1.00000 1.00000 # ceph pg dump pgs_brief | head -8 PG_STAT STATE UP UP_PRIMARY ACTING ACTING_PRIMARY 3.7e active+clean [6,0,2,5,3,7] 6 [6,0,2,5,3,7] 6 2.7f active+clean [7,5,2] 7 [7,5,2] 7 2.7e active+clean [0,1,8] 0 [0,1,8] 0 3.7c active+clean [6,5,0,7,2,8] 6 [6,5,0,7,2,8] 6 2.7d active+clean [0,8,3] 0 [0,8,3] 0 3.7d active+clean [7,0,3,8,1,2] 7 [7,0,3,8,1,2] 7 After osd reweight to 0.5: # ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 2.44707 root default -3 0.81569 host tceph-01 0 hdd 0.27190 osd.0 up 0.50000 1.00000 2 hdd 0.27190 osd.2 up 0.50000 1.00000 4 hdd 0.27190 osd.4 up 0.50000 1.00000 -7 0.81569 host tceph-02 6 hdd 0.27190 osd.6 up 0.50000 1.00000 7 hdd 0.27190 osd.7 up 0.50000 1.00000 8 hdd 0.27190 osd.8 up 0.50000 1.00000 -5 0.81569 host tceph-03 1 hdd 0.27190 osd.1 up 0.50000 1.00000 3 hdd 0.27190 osd.3 up 0.50000 1.00000 5 hdd 0.27190 osd.5 up 0.50000 1.00000 # ceph pg dump pgs_brief | head -8 PG_STAT STATE UP UP_PRIMARY ACTING ACTING_PRIMARY 3.7e active+remapped+backfill_wait [6,0,4,5,1,2147483647] 6 [6,0,2,5,3,7] 6 2.7f active+clean [7,5,2] 7 [7,5,2] 7 3.7f active+remapped+backfill_wait [1,2147483647,7,8,2147483647,2] 1 [0,5,4,8,6,2] 0 2.7e active+remapped+backfill_wait [5,4,8] 5 [1,8,0] 1 3.7c active+remapped+backfill_wait [2147483647,1,0,2147483647,2147483647,8] 1 [6,5,0,7,2,8] 6 2.7d active+remapped+backfill_wait [0,3,6] 0 [0,3,8] 0 3.7d active+remapped+backfill_wait [2147483647,0,3,6,4,2] 0 [7,0,3,8,1,2] 7 After osd crush reweight to 0.5*0.27190: # ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 1.22346 root default -3 0.40782 host tceph-01 0 hdd 0.13594 osd.0 up 1.00000 1.00000 2 hdd 0.13594 osd.2 up 1.00000 1.00000 4 hdd 0.13594 osd.4 up 1.00000 1.00000 -7 0.40782 host tceph-02 6 hdd 0.13594 osd.6 up 1.00000 1.00000 7 hdd 0.13594 osd.7 up 1.00000 1.00000 8 hdd 0.13594 osd.8 up 1.00000 1.00000 -5 0.40782 host tceph-03 1 hdd 0.13594 osd.1 up 1.00000 1.00000 3 hdd 0.13594 osd.3 up 1.00000 1.00000 5 hdd 0.13594 osd.5 up 1.00000 1.00000 # ceph pg dump pgs_brief | head -8 PG_STAT STATE UP UP_PRIMARY ACTING ACTING_PRIMARY 3.7e active+clean [6,0,2,5,3,7] 6 [6,0,2,5,3,7] 6 2.7f active+clean [7,5,2] 7 [7,5,2] 7 3.7f active+clean [0,5,4,8,6,2] 0 [0,5,4,8,6,2] 0 2.7e active+clean [0,1,8] 0 [0,1,8] 0 3.7c active+clean [6,5,0,7,2,8] 6 [6,5,0,7,2,8] 6 2.7d active+clean [0,8,3] 0 [0,8,3] 0 3.7d active+clean [7,0,3,8,1,2] 7 [7,0,3,8,1,2] 7 According to the documentation, I would expect identical mappings in all 3 cases. Can someone help me out here? Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Frank Schilder <frans@xxxxxx> Sent: 15 November 2022 10:09:10 To: ceph-users@xxxxxxx Subject: OSDs down after reweight Hi all, I re-weighted all OSDs in a pool down from 1.0 to the same value 0.052 (see reason below). After this, all hell broke loose. OSDs were marked down, slow OPS all over the place and the MDSes started complaining about slow ops/requests. Basically all PGs were remapped. After setting all re-weights back to 1.0 the situation went back to normal. Expected behaviour: No (!!!) PGs are remapped and everything continues to work. Why did things go down? More details: We have 24 OSDs with weight=1.74699 in a pool. I wanted to add OSDs with weight=0.09099 in such a way that the small OSDs receive approximately the same number of PGs as the large ones. Setting a re-weight factor of 0.052 for the large ones should achieve just that: 1.74699*0.05=0.09084. So, procedure was: - ceph osd crush reweight osd.N 0.052 for all OSDs in that pool - add the small disks and re-balance I would expect that the crush mapping is invariant under a uniform change of weight. That is, if I apply the same relative weight-change to all OSDs (new_weight=old_weight*common_factor) in a pool, the mappings should be preserved. However, this is not what I observed. How is it possible that PG mappings change if the relative weight of all OSDs to each other stays the same (the probabilities of picking an OSD are unchanged over all OSDs)? Thanks for any hints. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx