I might be stupid, but do I do something wrong with the script? [root@mon1 ceph-scripts]# ./tools/ceph-gentle-reweight -o 43,44,45,46,47,48,49,50,51,52,53,54,55 -s 00:00 -e 23:59 -b 82 -p rbd -t 1.74660 Draining OSDs: ['43', '44', '45', '46', '47', '48', '49', '50', '51', '52', '53', '54', '55'] Max latency (ms): 20 Max PGs backfilling: 82 Delta weight: 0.01 Target weight: 1.7466 Latency test pool: rbd Run interval: 60 Start time: 00:00:00 End time: 23:59:00 Allowed days: [] update_osd_tree: loading ceph osd tree update_osd_tree: done reweight_osds: changing all osds by weight 0.01 (target 1.7466) check current time: 18:18:59 check current day: 2 get_num_backfilling: PGs currently backfilling: 75 measure_latency: measuring 4kB write latency measure_latency: current latency is 5.50958 Traceback (most recent call last): File "./tools/ceph-gentle-reweight", line 191, in <module> main(sys.argv[1:]) File "./tools/ceph-gentle-reweight", line 186, in main reweight_osds(drain_osds, max_pgs_backfilling, max_latency, delta_weight, target_weight, test_pool, start_time, end_time, allowed_days, interval, really) File "./tools/ceph-gentle-reweight", line 98, in reweight_osds weight = get_crush_weight(osd) File "./tools/ceph-gentle-reweight", line 25, in get_crush_weight raise Exception('Undefined crush_weight for %s' % osd) Exception: Undefined crush_weight for 43 I already tried only a single osd and leaving the -t option out. Am Mi., 24. März 2021 um 16:31 Uhr schrieb Janne Johansson < icepic.dz@xxxxxxxxx>: > Den ons 24 mars 2021 kl 14:55 skrev Boris Behrens <bb@xxxxxxxxx>: > > > > Oh cool. Thanks :) > > > > How do I find the correct weight after it is added? > > For the current process I just check the other OSDs but this might be a > question that someone will raise. > > > > I could imagine that I need to adjust the ceph-gentle-reweight's target > weight to the correct one. > > I look at "ceph osd df tree" for the size, > > [...] > 287 hdd 11.00000 1.00000 11 TiB 81 GiB 80 GiB 1.3 MiB 1.7 GiB > 11 TiB 0.73 1.03 117 osd.287 > 295 ssd 3.64000 1.00000 3.6 TiB 9.9 GiB 87 MiB 2.0 GiB 7.9 GiB > 3.6 TiB 0.27 0.38 71 osd.295 > > the 11.0000 should somewhat match the 11TB detected size of the hdd, > as crush weight 3.64 is matching the 3.6TB size of the ssd. > > So when you add with lowered weight, you need to check what size you > have on the added drive(s). From there we have small scripts that take > a lot of newly added drives and raise the crush weight of them at the > same time (with norebalance before changing them, and unset'ing it > after all drives have gotten slightly bigger crush weight) to allow > for parallelism, while not going too wild on the amount of changes per > round (so the cluster can be HEALTH_OK for a moment in between each > step). > > -- > May the most significant bit of your life be positive. > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx