Re: add and start OSD without rebalancing

Magnus HAGDORN <Magnus.Hagdorn@xxxxxxxx> · Wed, 24 Mar 2021 21:42:02 +0000

we recently added 3 new nodes with 12x12TB OSDs. It took 3 days or so
to reshuffle the data and another 3 days to split the pgs. I did
increase the number of max backfills to speed up the process. We didn't
notice the reshuffling in normal operation.

On Wed, 2021-03-24 at 19:32 +0100, Dan van der Ster wrote:
> This email was sent to you by someone outside the University.
> You should only click on links or attachments if you are certain that
> the email is genuine and the content is safe.
>
> Not sure why, without looking at your crush map in detail.
>
> But to be honest, I don't think you need such a tool anymore. It was
> written back in the filestore days when backfilling could be much
> more
> disruptive than today.
>
> You have only ~10 osds to fill up: just mark them fully in, or
> increment by
> a few steps manually.
>
> .. dan
>
>
>
> On Wed, Mar 24, 2021, 6:24 PM Boris Behrens <bb@xxxxxxxxx> wrote:
>
> > I might be stupid, but do I do something wrong with the script?
> > [root@mon1 ceph-scripts]# ./tools/ceph-gentle-reweight -o
> > 43,44,45,46,47,48,49,50,51,52,53,54,55 -s 00:00 -e 23:59 -b 82 -p
> > rbd -t
> > 1.74660
> > Draining OSDs:  ['43', '44', '45', '46', '47', '48', '49', '50',
> > '51',
> > '52', '53', '54', '55']
> > Max latency (ms):  20
> > Max PGs backfilling:  82
> > Delta weight: 0.01
> > Target weight: 1.7466
> > Latency test pool: rbd
> > Run interval: 60
> > Start time: 00:00:00
> > End time: 23:59:00
> > Allowed days: []
> > update_osd_tree: loading ceph osd tree
> > update_osd_tree: done
> > reweight_osds: changing all osds by weight 0.01 (target 1.7466)
> > check current time: 18:18:59
> > check current day: 2
> > get_num_backfilling: PGs currently backfilling: 75
> > measure_latency: measuring 4kB write latency
> > measure_latency: current latency is 5.50958
> > Traceback (most recent call last):
> >   File "./tools/ceph-gentle-reweight", line 191, in <module>
> >     main(sys.argv[1:])
> >   File "./tools/ceph-gentle-reweight", line 186, in main
> >     reweight_osds(drain_osds, max_pgs_backfilling, max_latency,
> > delta_weight, target_weight, test_pool, start_time, end_time,
> > allowed_days,
> > interval, really)
> >   File "./tools/ceph-gentle-reweight", line 98, in reweight_osds
> >     weight = get_crush_weight(osd)
> >   File "./tools/ceph-gentle-reweight", line 25, in get_crush_weight
> >     raise Exception('Undefined crush_weight for %s' % osd)
> > Exception: Undefined crush_weight for 43
> >
> >
> > I already tried only a single osd and leaving the -t option out.
> >
> > Am Mi., 24. März 2021 um 16:31 Uhr schrieb Janne Johansson <
> > icepic.dz@xxxxxxxxx>:
> >
> > > Den ons 24 mars 2021 kl 14:55 skrev Boris Behrens <bb@xxxxxxxxx>:
> > > > Oh cool. Thanks :)
> > > >
> > > > How do I find the correct weight after it is added?
> > > > For the current process I just check the other OSDs but this
> > > > might be a
> > > question that someone will raise.
> > > > I could imagine that I need to adjust the ceph-gentle-
> > > > reweight's target
> > > weight to the correct one.
> > >
> > > I look at "ceph osd df tree" for the size,
> > >
> > > [...]
> > > 287   hdd   11.00000  1.00000  11 TiB  81 GiB  80 GiB 1.3
> > > MiB  1.7 GiB
> > >  11 TiB 0.73 1.03 117         osd.287
> > > 295   ssd    3.64000  1.00000 3.6 TiB 9.9 GiB  87 MiB 2.0
> > > GiB  7.9 GiB
> > > 3.6 TiB 0.27 0.38  71         osd.295
> > >
> > > the 11.0000 should somewhat match the 11TB detected size of the
> > > hdd,
> > > as crush weight 3.64 is matching the 3.6TB size of the ssd.
> > >
> > > So when you add with lowered weight, you need to check what size
> > > you
> > > have on the added drive(s). From there we have small scripts that
> > > take
> > > a lot of newly added drives and raise the crush weight of them at
> > > the
> > > same time (with norebalance before changing them, and unset'ing
> > > it
> > > after all drives have gotten slightly bigger crush weight) to
> > > allow
> > > for parallelism, while not going too wild on the amount of
> > > changes per
> > > round (so the cluster can be HEALTH_OK for a moment in between
> > > each
> > > step).
> > >
> > > --
> > > May the most significant bit of your life be positive.
> > >
> >
> > --
> > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal
> > abweichend im
> > groÃƒ¼en Saal.
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx