we recently added 3 new nodes with 12x12TB OSDs. It took 3 days or so to reshuffle the data and another 3 days to split the pgs. I did increase the number of max backfills to speed up the process. We didn't notice the reshuffling in normal operation. On Wed, 2021-03-24 at 19:32 +0100, Dan van der Ster wrote: > This email was sent to you by someone outside the University. > You should only click on links or attachments if you are certain that > the email is genuine and the content is safe. > > Not sure why, without looking at your crush map in detail. > > But to be honest, I don't think you need such a tool anymore. It was > written back in the filestore days when backfilling could be much > more > disruptive than today. > > You have only ~10 osds to fill up: just mark them fully in, or > increment by > a few steps manually. > > .. dan > > > > On Wed, Mar 24, 2021, 6:24 PM Boris Behrens <bb@xxxxxxxxx> wrote: > > > I might be stupid, but do I do something wrong with the script? > > [root@mon1 ceph-scripts]# ./tools/ceph-gentle-reweight -o > > 43,44,45,46,47,48,49,50,51,52,53,54,55 -s 00:00 -e 23:59 -b 82 -p > > rbd -t > > 1.74660 > > Draining OSDs: ['43', '44', '45', '46', '47', '48', '49', '50', > > '51', > > '52', '53', '54', '55'] > > Max latency (ms): 20 > > Max PGs backfilling: 82 > > Delta weight: 0.01 > > Target weight: 1.7466 > > Latency test pool: rbd > > Run interval: 60 > > Start time: 00:00:00 > > End time: 23:59:00 > > Allowed days: [] > > update_osd_tree: loading ceph osd tree > > update_osd_tree: done > > reweight_osds: changing all osds by weight 0.01 (target 1.7466) > > check current time: 18:18:59 > > check current day: 2 > > get_num_backfilling: PGs currently backfilling: 75 > > measure_latency: measuring 4kB write latency > > measure_latency: current latency is 5.50958 > > Traceback (most recent call last): > > File "./tools/ceph-gentle-reweight", line 191, in <module> > > main(sys.argv[1:]) > > File "./tools/ceph-gentle-reweight", line 186, in main > > reweight_osds(drain_osds, max_pgs_backfilling, max_latency, > > delta_weight, target_weight, test_pool, start_time, end_time, > > allowed_days, > > interval, really) > > File "./tools/ceph-gentle-reweight", line 98, in reweight_osds > > weight = get_crush_weight(osd) > > File "./tools/ceph-gentle-reweight", line 25, in get_crush_weight > > raise Exception('Undefined crush_weight for %s' % osd) > > Exception: Undefined crush_weight for 43 > > > > > > I already tried only a single osd and leaving the -t option out. > > > > Am Mi., 24. März 2021 um 16:31 Uhr schrieb Janne Johansson < > > icepic.dz@xxxxxxxxx>: > > > > > Den ons 24 mars 2021 kl 14:55 skrev Boris Behrens <bb@xxxxxxxxx>: > > > > Oh cool. Thanks :) > > > > > > > > How do I find the correct weight after it is added? > > > > For the current process I just check the other OSDs but this > > > > might be a > > > question that someone will raise. > > > > I could imagine that I need to adjust the ceph-gentle- > > > > reweight's target > > > weight to the correct one. > > > > > > I look at "ceph osd df tree" for the size, > > > > > > [...] > > > 287 hdd 11.00000 1.00000 11 TiB 81 GiB 80 GiB 1.3 > > > MiB 1.7 GiB > > > 11 TiB 0.73 1.03 117 osd.287 > > > 295 ssd 3.64000 1.00000 3.6 TiB 9.9 GiB 87 MiB 2.0 > > > GiB 7.9 GiB > > > 3.6 TiB 0.27 0.38 71 osd.295 > > > > > > the 11.0000 should somewhat match the 11TB detected size of the > > > hdd, > > > as crush weight 3.64 is matching the 3.6TB size of the ssd. > > > > > > So when you add with lowered weight, you need to check what size > > > you > > > have on the added drive(s). From there we have small scripts that > > > take > > > a lot of newly added drives and raise the crush weight of them at > > > the > > > same time (with norebalance before changing them, and unset'ing > > > it > > > after all drives have gotten slightly bigger crush weight) to > > > allow > > > for parallelism, while not going too wild on the amount of > > > changes per > > > round (so the cluster can be HEALTH_OK for a moment in between > > > each > > > step). > > > > > > -- > > > May the most significant bit of your life be positive. > > > > > > > -- > > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal > > abweichend im > > groüen Saal. > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx